web-platform-tests / results-collection

Other
41 stars 46 forks source link

Now that we have some data, analyze it for flaky tests #22

Closed jeffcarp closed 6 years ago

jeffcarp commented 7 years ago

Now that we have around 20 days of data for Chrome 60 and Firefox 55, it'd be great if we could look at whether any tests have been flaky.

jeffcarp commented 7 years ago

Some initial data:

Chrome 60

27 analyzed_results
Top 20 tests sorted by number of instances num total results matched but num passing results differed: 
9 /html/dom/elements/elements-in-the-dom/historical.html
7 /workers/constructors/Worker/unexpected-self-properties.worker.html
7 /webrtc/getstats.html
6 /css/css-writing-modes-3/sizing-orthog-htb-in-vrl-020.xht
6 /geolocation-API/interfaces.html
5 /css/compositing-1/mix-blend-mode/mix-blend-mode-animation.html
5 /service-workers/service-worker/skip-waiting-using-registration.https.html
5 /cookies/path/match.html
4 /webusb/usb-disabled-by-feature-policy.https.sub.html
4 /shadow-dom/slotchange-event.html
4 /webusb/usb-default-feature-policy.https.sub.html
4 /shadow-dom/slots-fallback.html
4 /shadow-dom/slots.html
4 /html/semantics/embedded-content/the-embed-element/embed-in-object-fallback.html
4 /service-workers/service-worker/clients-matchall-order.https.html
4 /dom/events/Event-dispatch-click.html
3 /dom/nodes/Element-matches.html
3 /navigation-timing/nav2_test_attributes_exist.html
3 /html/semantics/selectors/pseudo-classes/checked.html
3 /html/infrastructure/urls/resolving-urls/query-encoding/utf-16be.html
3 /service-workers/service-worker/skip-waiting-without-using-registration.https.html
3 /html/semantics/document-metadata/the-meta-element/pragma-directives/attr-meta-http-equiv-refresh/moving-documents.html
3 /media-source/mediasource-detach.html
3 /html/browsers/the-window-object/apis-for-creating-and-navigating-browsing-contexts-by-name/open-features-non-integer-height.html
3 /html/semantics/embedded-content/the-img-element/img.complete.html
3 /html/infrastructure/urls/resolving-urls/query-encoding/utf-8.html
3 /webaudio/the-audio-api/the-waveshapernode-interface/curve-tests.html
2 /css/CSS2/tables/border-collapse-dynamic-rowgroup-003.xht
2 /css/CSS2/tables/border-collapse-dynamic-row-001.xht
2 /service-workers/service-worker/claim-fetch.https.html

Firefox 55

12 analyzed_results
Top 20 tests sorted by number of instances num total results matched but num passing results differed:
5 /html/semantics/embedded-content/the-img-element/img.complete.html
5 /html/infrastructure/urls/resolving-urls/query-encoding/utf-16be.html
5 /html/infrastructure/urls/resolving-urls/query-encoding/windows-1252.html
5 /html/infrastructure/urls/resolving-urls/query-encoding/windows-1251.html
5 /html/infrastructure/urls/resolving-urls/query-encoding/utf-8.html
5 /html/infrastructure/urls/resolving-urls/query-encoding/utf-16le.html
5 /media-source/mediasource-duration.html
5 /WebCryptoAPI/derive_bits_keys/pbkdf2.worker.html
4 /css/vendor-imports/mozilla/mozilla-central-reftests/images3/object-fit-scale-down-svg-002o.html
4 /html/dom/elements/elements-in-the-dom/historical.html
4 /cssom-view/HTMLBody-ScrollArea_quirksmode.html
4 /webvtt/rendering/cues-with-video/processing-model/selectors/cue/background_shorthand.html
4 /workers/constructors/Worker/unexpected-self-properties.worker.html
4 /css/css-animations-1/animation-delay-010.html
3 /css/vendor-imports/mozilla/mozilla-central-reftests/images3/object-fit-none-svg-006e.html
3 /service-workers/service-worker/fetch-canvas-tainting-cache.https.html
3 /service-workers/service-worker/fetch-canvas-tainting.https.html
3 /service-workers/service-worker/claim-fetch.https.html
3 /geolocation-API/interfaces.html
3 /wasm/wasm_local_iframe_test.html
2 /html/dom/elements/requirements-relating-to-bidirectional-algorithm-formatting-characters/dir-isolation-004b.html
2 /css/vendor-imports/mozilla/mozilla-central-reftests/images3/object-fit-contain-svg-004o.html
2 /css/css-writing-modes-3/bidi-unset-004.html
2 /html/rendering/non-replaced-elements/tables/table-layout.html
2 /mixed-content/optionally-blockable/http-csp/cross-origin-http/link-prefetch-tag/top-level/keep-scheme-redirect/opt-in-blocks.https.html
2 /css/vendor-imports/mozilla/mozilla-central-reftests/conditional3/css-supports-012.xht
2 /css/vendor-imports/mozilla/mozilla-central-reftests/values3/calc-offsets-relative-right-1.html
2 /referrer-policy/same-origin/meta-referrer/cross-origin/http-https/fetch-request/cross-origin.no-redirect.http.html
2 /css/vendor-imports/mozilla/mozilla-central-reftests/text-decor-3/text-emphasis-position-property-005e.html
2 /css/vendor-imports/mozilla/mozilla-central-reftests/shapes1/shape-outside-margin-box-border-radius-007.html

Script is in https://github.com/GoogleChrome/wptdashboard/commit/fa942110e9003bb11fca6e8530e2964edfb9219e

foolip commented 7 years ago

@jgraham in case you want to look into some of the Firefox flakiness.

@jeffcarp, would it be possible to get the flaky subtests from the existing data? Even reloading the test over and over manually, it'd otherwise be hard to know what to look for in a file with many tests.

jeffcarp commented 7 years ago

would it be possible to get the flaky subtests from the existing data?

This currently isn't possible but that is definitely the plan. I haven't written the part of the runner script that uploads results for individual tests so right now we only have numbers for passing & total tests at the test file resolution.

Created an issue for that: #23

bobholt commented 7 years ago

I compared the two lists, and pulled out the tests that are flaky in both Chrome 60 and FF 55, indicating potentially that the tests themselves are at fault:

/geolocation-API/interfaces.html
/html/dom/elements/elements-in-the-dom/historical.html
/html/infrastructure/urls/resolving-urls/query-encoding/utf-16be.html
/html/infrastructure/urls/resolving-urls/query-encoding/utf-8.html
/html/semantics/embedded-content/the-img-element/img.complete.html
/service-workers/service-worker/claim-fetch.https.html
/workers/constructors/Worker/unexpected-self-properties.worker.html

None of these appear in the flaky test log @jugglinmike created in March, and so probably merit further investigation.

foolip commented 7 years ago

@JKereliuk, if looking generically to dive into wpt, https://github.com/GoogleChrome/wptdashboard/issues/22#issuecomment-326300847 might be worth a look. Most are likely not due to anything with ChromeDriver, but there has been flakiness where ChromeDriver seems to have been at fault.

foolip commented 6 years ago

@mdittmer will you move this and other project:metrics issues into https://github.com/web-platform-tests/results-analysis?

foolip commented 6 years ago

Closing in favor of https://github.com/web-platform-tests/wpt.fyi/issues/66.