web-platform-tests / wpt.fyi

web-platform-tests dashboard
https://wpt.fyi/
Other
188 stars 90 forks source link

wpt.fyi results intermittently conflict between current and history view #3957

Open cookiecrook opened 2 months ago

cookiecrook commented 2 months ago

Starting with WPT.fyi since it's conveying conflicting results... I'm not sure if that's a data source problem from the wpt repo, or a bug here.

WPT.fyi lists a Safari failure on subtest "[xlink:title][href] > rect" but the Safari history shows it passing always, which aligns with our local testing... The conflicting results listed on the same wpt.fyi page can't both be true.

https://wpt.fyi/results/svg-aam/name/comp_host_language_label.html?label=master&label=experimental&aligned&view=interop&q=label%3Ainterop-2024-accessibility

Therefore I am assuming this is an error in how WPT.fyi tracks the results, rather than in the actual result.

@gsnedders hypothesized "The history view might just be looking at the first/last run of each day?"

cookiecrook commented 2 months ago

FWIW, this is no longer showing an error immediately after filing. 🤷

cookiecrook commented 2 months ago

Sometimes seeing it here, too. https://wpt.fyi/results/accname/name/comp_labelledby.html?label=master&label=experimental&aligned&view=interop&q=label%3Ainterop-2024-accessibility

cookiecrook commented 2 months ago

That one's also no longer reproducing.

cookiecrook commented 2 months ago

Promise rejection screen shot

Proof the current result sometimes shows as failing, even though the history view never shows failures. This bug is not tracking the specific failure... It's tracking the fact that WPT.FYI sometimes shows conflicting results on the same page.

past commented 1 month ago

@DanielRyanSmith is this expected? I think the history view captures a single daily run, so it may have missed some intermittent failures during the day, if I understood correctly what you told me recently.

DanielRyanSmith commented 1 month ago

@gsnedders hypothesized "The history view might just be looking at the first/last run of each day?"

@DanielRyanSmith is this expected? I think the history view captures a single daily run, so it may have missed some intermittent failures during the day, if I understood correctly what you told me recently.

These is essentially correct, as the history timeline is populated by finding the first set of aligned runs that occur each day. It's possible that it will not find a flake if the flake it happening rather rarely.

Additionally, I'm seeing in the history timeline now that there are some runs that occasionally crash for this test, although this might be unrelated.

https://wpt.fyi/results/svg-aam/name/comp_host_language_label.html?label=master&label=experimental&max-count=10&product=safari&q=seq%28%28status%3APASS%7Cstatus%3AOK%29%20%28status%3A%21PASS%26status%3A%21OK%26status%3A%21unknown%29%29%20seq%28%28status%3A%21PASS%26status%3A%21OK%26status%3A%21unknown%29%20%28status%3APASS%7Cstatus%3AOK%29%29