osrg / namazu

:fish: 鯰: Programmable fuzzy scheduler for testing distributed systems
http://osrg.github.io/namazu
Apache License 2.0
493 stars 42 forks source link

Evaluation of process inspector #125

Open AkihiroSuda opened 8 years ago

AkihiroSuda commented 8 years ago

We need to quantitatively evaluate the process inspector as well as the Ethernet inspector (FOSDEM presentation slide)

AkihiroSuda commented 8 years ago

Tried to reproduce ZOOKEEPER-2212 with several configs.

All the experiments are done on my local lenovo pc. (Xeon E3-1220 v3 * 4, 8 GB RAM)

EQ Config #CPU assigned #Exp Reproducibility #Pattern@1000 exp Notes
None 4 5,000 0% 156 Data is from FOSDEM slide.
Ether 4 1,000 21.8% 573 Ditto. With latest EQ + 1 CPU, reproducibility grew to about 50%.
None 1 1,000 0% N/A
None + SCHED_BATCH 1 1,000 0% N/A
Proc(mild{UseBatch:true})
(SCHED_BATCH + random nice values)
1 5,000 0.7% 634 0.08% experiments failed due to timeout
Proc(mild{UseBatch:true}) 4 5,000 0.32% 548 No experiment failed due to timeout
Proc(mild{UseBatch:false}) 1 5,000 0.26% 914 90% experiments failed due to timeout
AkihiroSuda commented 8 years ago

Also tested ZOOKEEPER-2137 with the latest ZooKeeper (just 50 times on 4 CPUs):

EQ Config #CPU assigned #Exp Reproducibility #Pattern@1000 exp Notes
None 4 50 2% N/A -
Proc(mild{UseBatch:true})
(SCHED_BATCH + random nice values)
4 50 16% N/A -
Proc(mild{UseBatch:true}) 1 50 2% N/A -

This reproducibility is useful enough (on 4 CPUs). The process inspector works well with ZOOKEEPER-2137, although not with 2212. I guess this is because ZOOKEEPER-2137 runs longer (> 1 min) than 2212, i.e., much more chances to work are given to sched_setattr().

I keep this issue ticket open for discussion.

PTAL @mitake

AkihiroSuda commented 8 years ago

Evaluated some YARN (apache/hadoop@4e4b3a8465a8433e78e015cb1ce7e0dc1ebeb523 ) tests using osrg/earthquake@13aa33b371fc714608061f4671a83dd18d7b25fe (mild{UseBatch:true), on AWS t2.large (2 CPUs assigned).

Tests are executed 100 times with/without Earthquake.

Note that this version of Earthquake does not contain an optimization (#146)

Test Reproducibility(without EQ) Reproducibility(with EQ)
YARN-4548(RM/TestCapacityScheduler) 11% 82%
YARN-4556(RM/TestFifoScheduler 2% 44%
YARN-4168(NM/TestLogAggregationService) 1% 8%
YARN-1978(NM/TestLogAggregationService 0% 4%
YARN-4543(NM/TestNodeStatusUpdater) 0% 1%
AkihiroSuda commented 8 years ago

I found sometimes it is better to apply Namazu (formerly named Earthquake) for stress process rather than Hadoop mvn process.

Testcase: YARN-5043 (RM/TestAMRestart) (apache/hadoop@06413da72efed9a50e49efaf7110c220c88a7f4a ) using osrg/namazu@8e4f26836c4affa15a6bb5ade57f21bd9417354e (mild{UseBatch:true), on AWS t2.large (2 CPUs assigned). Done 100 times.

Stress: stress --cpu 2

Running stress? Namazu applied for Reproducibility
N None 16%
Y None 12%
N mvn 7%
Y stress 30%

TODO:

mitake commented 8 years ago

I'd like to report my experiment of etcd 5022: https://github.com/coreos/etcd/issues/5022

w/ or w/o Namazu process inspector Reproducibility
w/o 0%
w/ 2.7%

Both of a number of test running in the above experiments is 1000.

Parameters of explorer policy:

explorePolicy = "random"
[explorePolicyParam]
 procPolicy = "dirichlet"