web-platform-tests / rfcs

web-platform-tests RFCs
75 stars 63 forks source link

Add RFC to disable Chromium stability checks #131

Closed past closed 1 year ago

past commented 1 year ago

This RFC follows a proposal from @jgraham in a discussion on Matrix from a few days ago.

past commented 1 year ago

In the WPT RFCs and infrastructure meeting yesterday I promised to share more details about how the flakiness detection works in Chromium CI. Here is a publicly accessible recent example, courtesy of @WeizhongX.

If you look closely you see that in step 45 we run blink_wpt_tests, which fail. Then in step 50 the system is trying to query the results database for any pre-existing flakiness. And it was able to decide that the failure in blink_web_tests (step 42) is due to a known flaky test, since there is a record of prior failed runs with that test (check the stdout link). It doesn't see any pre-existing flakiness in the failed blink_wpt_tests though. Then in step 57 it retries (perhaps unnecessarily) the shards with the patch a second time. In step 71, it retries without the patch applied. At this point, any previously failed tests will be run 10 times each. Any single failure here will cause the test to be treated as an existing failure, and the patch will be exonerated. It doesn't detect any failures, so the patch is clearly at fault. Finally in step 78 we can see that the patch was exonerated for the blink_web_tests failure since it was pre-existing, but in step 80 it declares the patch responsible for the blink_wpt_tests failures since reruns without the patch were successful.

In case it is useful, the recipe code for trybots can be found here and the recipe documentation is here as well. There are also a few different internal dashboards and alerts that automate the investigation process and provide aggregate data to browser engineers.

Of course all this is not easy to upstream to WPT infra, but hopefully it provides some pointers on what a future comprehensive solution could look like. For now, disabling the stability checks or making them non-blocking is the most efficient path.

foolip commented 1 year ago

I just had https://github.com/web-platform-tests/wpt/pull/39105 blocked by this. I can admin merge myself, but most contributors can't and would be stuck for a while. @past do you have time soon to finish this?

whimboo commented 1 year ago

I just had web-platform-tests/wpt#39105 blocked by this. I can admin merge myself, but most contributors can't and would be stuck for a while. @past do you have time soon to finish this?

I can second that. I don't have admin permissions and as such cannot merge myself. We always have to ask @jgraham to get such PRs merged which in recent weeks is probably around 90% of the PRs. We as well would appreciate if this PR can be finished.

past commented 1 year ago

Apologies for the delay. I believe I have now addressed the review feedback, so this is ready for another review.

whimboo commented 1 year ago

@past so who has to review these changes from Google's side? Looks like this PR is stuck.

Interestingly I have seen no such failure for quite some time for our PRs, but maybe I have missed such failures for others. Is there a way to easily see the failure rate of the Chromium stability job?

past commented 1 year ago

Hey Henrik, @jgraham promised to take another look so I am waiting for a green light from him to land this.

jgraham commented 1 year ago

I don't quite know what you were hoping for, but I've rewritten the RFC to better explain the tradeoffs as I understand them, and provide a bit more detail on the actual implementation requirements.

past commented 1 year ago

Thanks James, I'm happy with your changes if you want to review and merge this.

jgraham commented 1 year ago

Since this was a fairly substantive change I think it requires at least the one week timeout before merging. But also I'd be interested in input from other members of @web-platform-tests/wpt-core-team