Closed victorb closed 5 years ago
To try and isolate and find what tests are flaky I started to look at the master branch of the js-ipfs project. I created a tool that gathers the failure report data and creates a simple chart to help highlight failures. I calculate a standard deviation using job run number, and only include failures that have a stdev over 1, this eliminates failures that happen consecutively. The more sporadic the failures, the high the stdev.
The rows are order by the total number of failures for the given error / test.
This is the chart for js-ipfs/master for runs 150 (Early July) to 237 (now)
Chart: https://gateway.ipfs.io/ipfs/QmdkRvAuDbktNJzEF3W6Sm9GuqRcsTFKMGCExZ3bYUuyoA/ Raw Data: https://ipfs.io/ipfs/QmdkRvAuDbktNJzEF3W6Sm9GuqRcsTFKMGCExZ3bYUuyoA/js-ipfs.master.150.237.json
The following failures seem the most likely candidates for being flaky tests.
Some of these will only fail on certain platforms / versions of nodejs though.
macos 8.11.3 x18
should import an exported key – interface-ipfs-core tests .key.importmacos 8.11.3 x5
should get repo stats – interface-ipfs-core tests .repo.statmacos 8.11.3 x4
"before all" hook – interface-ipfs-core tests .swarm.localAddrsmacos 8.11.3 x4
"before all" hook – interface-ipfs-core tests .swarm.peersmacos 8.11.3 x3
"before all" hook – interface-ipfs-core tests .swarm.disconnectmacos 10.4.1 x3
3 peers – bitswap transfer a block betweenwindows 8.11.3 x3
add alias – cli files daemon off (directly to core)windows 8.11.3 x3
add recursively test – cli files daemon on (through http-api)windows 10.4.1 x3
handles multiple hashes – cli pin daemon off (directly to core) lswindows 10.4.1 x3
lists all pins when no hash is passed – cli pin daemon off (directly to core) lswindows 10.4.1 x3
recursively (default) – cli pin daemon off (directly to core) addmacos 8.11.3 x2
"after all" hook – interface-ipfs-core tests .repo.statmacos 8.11.3 x2
"after all" hook – interface-ipfs-core tests .stats.bwmacos 8.11.3 x2
"before all" hook – interface-ipfs-core tests .pingReadableStreammacos 8.11.3 x2
"before all" hook – interface-ipfs-core tests .stats.bitswapmacos 8.11.3 x2
"before all" hook – interface-ipfs-core tests .stats.bwPullStreammacos 8.11.3 x2
"before all" hook – interface-ipfs-core tests .stats.repomacos 8.11.3 x2
"before all" hook – interface-ipfs-core tests .swarm.connectmacos 10.4.1 x2
2 peers – bitswap transfer a block betweenwindows 8.11.3 x2
add --silent – cli files daemon off (directly to core)windows 8.11.3 x2
add directory with trailing slash test – cli files daemon off (directly to core)macos 8.11.3 x2
should get repo stats (promised) – interface-ipfs-core tests .repo.statI'm going to take this information and try to pull out some of the tests that I think we should apply retry logic too. I will also push up the tools I used for generate this information so that we can use it for other projects.
This is so awesome, really great to know where to focus our energy.
Agree with Alan, awesome work @travisperson!
We should be able to separate test failures because of timeouts (which I think many of these are) compared to "normal" test failures and "exceptional" test failures. Normal test failures would be a test case which has a assertion that is failing, exceptional test failures would be things like yarn install
couldn't finish because of a 404 response. Outside of the test suites. I'll make sure the pipelines can handle it (go-ipfs already does this), then it'll be a bit easier to show in the table.
It'll be very useful to show the output of the failure when hovering/clicking on a cell in the table, so we could see directly what's going wrong.
@travisperson can you publish the code for generating this somewhere?
Published https://github.com/travisperson/jenkins-flake-report
Are these all timeout errors?
I think most of them are.
We should be able to separate test failures because of timeouts
Ya, the data I'm getting from Jenkins has some information we can test for to see if it's a timeout.
It'll be very useful to show the output of the failure when hovering/clicking on a cell in the table, so we could see directly what's going wrong.
Ya I think that would be great. Currently the script is just a golang html template, but we could at least use a title
attribute or something to quickly view some information and extend it further to be more detailed. I originally wrote it as a simple React table but didn't want to deal with all the dependencies and converted it to just plain html.
js-ipfs and js-libp2p are the biggest JS codebases we have and the test suites not only takes long time to run, but are also flaky, meaning they fail randomly.
We should make use of whatever tools we have (mocha's
.retry
for example) to make them not be flaky. Nothing is worse than after 40 minutes test run, see that one test timed out.We can consider this solved once you can run tests 10 times for the same commit and always have a successful run (if it was successful the first time) on CI.