guardian / playground

Come to the playground and peruse the sand castles.
2 stars 0 forks source link

DCR test flakiness data-gathering #21

Open bryophyta opened 2 years ago

bryophyta commented 2 years ago

There is now a gist for the scraping part of this project.

What is this project about

There's a general sense that some of the tests in our CI pipeline (both in Github Actions and Team City) have become flakier recently, i.e. we're getting failing runs which aren't identifying an actual problem in the build.

I've started collecting some data on the frequency of these false negatives as a 10% project, and if anyone else is interested in this then you'd be welcome to join me!

I've got a Deno script which queries the Github API to collect and process data on GHA runs. I'll be extending this in the near future to make it easier to collect more data and also run automatically.

I'd also welcome any suggestions on good ways to measure and visualise 'flakiness'. Some cases are very clear-cut in the logs, but other cases are more subtle. I'll post some code snippets here soon, but in the meantime feel free to comment or reach out!

bryophyta commented 1 year ago

Update on this: The scraping script should work. (It's not entirely up to date, but I'll aim to update it with the current version soon.) And I've used it to collect data to analyse a couple of specific questions about our CI pipeline.

Overall, the data hasn't shown very high levels of false negatives, which is good news! And given that the whole historical dataset on DCR Github Action runs can be collected within an afternoon, it doesn't seem like there's much need to set up continuous collection or monitoring for this at the moment.

If anyone comes across this in the future and wants to use or extend this project, please feel free to comment!