web-platform-tests / interop

web-platform-tests Interop project
https://wpt.fyi/interop
279 stars 28 forks source link

Find a way to get more frequent aligned runs #678

Open nt1m opened 1 month ago

nt1m commented 1 month ago

The experimental dashboard has had no aligned run from Jul 18th to Jul 22th, the dashboard is still stuck on Jul 18th.

This is worse than last year because of the introduction of Edge to the dashboard. Now the dashboard requires 4 aligned runs as opposed to 3. Often, we get a run with Edge being aligned with FF & Chrome, or Safari being aligned, but rarely both.

It would be nice to solve this issue. If we could somehow schedule Edge runs in sync with Safari, that would be nice!

Related suggestion from James: https://github.com/web-platform-tests/wpt.fyi/issues/3689

@web-platform-tests/admins

jgraham commented 1 month ago

Edge and Safari are supposed to be aligned I think. The main problem currently is that Azure keeps losing runners and we don't have any way to automatically recover. If that affects both Edge and Safari then the chances of getting an aligned run are diminished (there's also a problem with Firefox where one test sometimes causes a harness error we can't recover from; needless to say it's difficult to reproduce locally).

foolip commented 1 month ago

I haven't looked at the trends recently, but the way I would approach this problem is to first treat the epochs/three_hourly branch as the target, since that's the most often we could align without increasing the rate of Edge and Safari runs.

Then, look at the the percentage of the three hourly commits (./wpt rev-list --epoch 3h --max-count 100) that have results for each browser. Then, focus on whichever has the least reliable results, asking someone from that browser vendor to drive the effort if possible.

foolip commented 1 month ago

I realized it's possible to answer the question on the command line, so here's the output of ./wpt rev-list --epoch 3h --max-count 100 | while read sha; do curl --silent "https://wpt.fyi/api/runs?label=master&label=experimental&sha=$sha" | jq '.[] | .browser_name'; done | sort | uniq -c:

91 "chrome" 39 "edge" 91 "firefox" 11 "safari"

The last of the 100 commits is from June 18, so a bit over a month.

@gsnedders is https://github.com/web-platform-tests/wpt/pull/47181 the plan to improve the reliability of Safari runs, or just an experiment?

@dandclark is there someone who can help us look into the reliability of Edge runs on Azure Pipelines?

stubbornella commented 1 month ago

Do we need to run all of the WPT suite of tests more frequently or could we run just the tests included in Interop 2024 at a higher rate? A bandaid solution at best, but maybe one that can help us while we work towards something longer term.

nt1m commented 4 weeks ago

@foolip Can we show the latest non-aligned runs for each browsers? and also link to those. I think it would give a more accurate view of things.

foolip commented 4 weeks ago

@stubbornella @nt1m I think those are both ideas worth exploring. I believe that the scoring rewrite that @jgraham did no longer requires aligned runs, and if that's correct it's probably best to prioritize review and deploy of that.

However, I'll be on parental leave until February, so I'll have to defer to the rest of the group on, well, everything 😄

jgraham commented 4 weeks ago

https://github.com/jgraham/interop-results/tree/main/2024/results/revisions has interop results for all the revisions for which we have wpt results.

The main remaining issue is getting UI work so that you can switch scores on the dashboard. @DanielRyanSmith has done all the UI so far, but I imagine that if someone else can contribute patches that will speed the process along a bit.