Evaluation of process inspector

AkihiroSuda commented 8 years ago

We need to quantitatively evaluate the process inspector as well as the Ethernet inspector (FOSDEM presentation slide)

AkihiroSuda commented 8 years ago

Tried to reproduce ZOOKEEPER-2212 with several configs.

All the experiments are done on my local lenovo pc. (Xeon E3-1220 v3 * 4, 8 GB RAM)

Earthquake: a7defa0
Kernel: 4.2.0-30-generic #36-Ubuntu

EQ Config	#CPU assigned	#Exp	Reproducibility	#Pattern@1000 exp	Notes
None	4	5,000	0%	156	Data is from FOSDEM slide.
Ether	4	1,000	21.8%	573	Ditto. With latest EQ + 1 CPU, reproducibility grew to about 50%.
None	1	1,000	0%	N/A
None + SCHED_BATCH	1	1,000	0%	N/A
Proc(`mild{UseBatch:true}`) (SCHED_BATCH + random nice values)	1	5,000	0.7%	634	0.08% experiments failed due to timeout
Proc(`mild{UseBatch:true}`)	4	5,000	0.32%	548	No experiment failed due to timeout
Proc(`mild{UseBatch:false}`)	1	5,000	0.26%	914	90% experiments failed due to timeout

mild{UseBatch:true} provides better reproducibility than mild{UseBatch:false}, but not so good as the Ethernet inspector.
mild{UseBatch:false} provides better pattern growth, but not useful for ZOOKEEPER-2212 due to too many timeouts.
Proc(extreme) likely to cause starvation on single CPU, so I did not experimented.
Proc(dirichlet) hits the bug mentioned in README.

AkihiroSuda commented 8 years ago

Also tested ZOOKEEPER-2137 with the latest ZooKeeper (just 50 times on 4 CPUs):

EQ Config	#CPU assigned	#Exp	Reproducibility	#Pattern@1000 exp	Notes
None	4	50	2%	N/A	-
Proc(`mild{UseBatch:true}`) (SCHED_BATCH + random nice values)	4	50	16%	N/A	-
Proc(`mild{UseBatch:true}`)	1	50	2%	N/A	-

This reproducibility is useful enough (on 4 CPUs). The process inspector works well with ZOOKEEPER-2137, although not with 2212. I guess this is because ZOOKEEPER-2137 runs longer (> 1 min) than 2212, i.e., much more chances to work are given to sched_setattr().

I keep this issue ticket open for discussion.

PTAL @mitake

AkihiroSuda commented 8 years ago

Evaluated some YARN (apache/hadoop@4e4b3a8465a8433e78e015cb1ce7e0dc1ebeb523 ) tests using osrg/earthquake@13aa33b371fc714608061f4671a83dd18d7b25fe (mild{UseBatch:true), on AWS t2.large (2 CPUs assigned).

Tests are executed 100 times with/without Earthquake.

Note that this version of Earthquake does not contain an optimization (#146)

Test	Reproducibility(without EQ)	Reproducibility(with EQ)
YARN-4548(RM/TestCapacityScheduler)	11%	82%
YARN-4556(RM/TestFifoScheduler	2%	44%
YARN-4168(NM/TestLogAggregationService)	1%	8%
YARN-1978(NM/TestLogAggregationService	0%	4%
YARN-4543(NM/TestNodeStatusUpdater)	0%	1%

AkihiroSuda commented 8 years ago

I found sometimes it is better to apply Namazu (formerly named Earthquake) for stress process rather than Hadoop mvn process.

Testcase: YARN-5043 (RM/TestAMRestart) (apache/hadoop@06413da72efed9a50e49efaf7110c220c88a7f4a ) using osrg/namazu@8e4f26836c4affa15a6bb5ade57f21bd9417354e (mild{UseBatch:true), on AWS t2.large (2 CPUs assigned). Done 100 times.

Stress: stress --cpu 2

Running stress?	Namazu applied for	Reproducibility
N	None	16%
Y	None	12%
N	mvn	7%
Y	stress	30%

TODO:

reevaluate other YARN tests with stress
scientific, and reliable analysis

mitake commented 8 years ago

I'd like to report my experiment of etcd 5022: https://github.com/coreos/etcd/issues/5022

w/ or w/o Namazu process inspector	Reproducibility
w/o	0%
w/	2.7%

Both of a number of test running in the above experiments is 1000.

Parameters of explorer policy:

explorePolicy = "random"
[explorePolicyParam]
 procPolicy = "dirichlet"

osrg / namazu

Evaluation of process inspector #125