web-platform-tests / rfcs

web-platform-tests RFCs
75 stars 63 forks source link

RFC 102: consider ending Edge runs on Windows and moving to Linux #102

Closed thejohnjansen closed 2 years ago

thejohnjansen commented 2 years ago

@jgraham, tagging you explicitly because I believe this would increase the cost to Mozilla.

jgraham commented 2 years ago

This generally seems reasonable, but two points:

foolip commented 2 years ago

As it is, we try to run Edge Dev every 3 hours: https://github.com/web-platform-tests/wpt/blob/bb493eeb3871f33a3a245f9d240dbbe216c1c352/.azure-pipelines.yml#L406

I don't believe there's value in running it that often for the purposes of identifying interop issues. However, the pace of Edge runs also limits how quickly new results end up on the default wpt.fyi view.

In other words, this somewhat accidentally ends up closely related to https://github.com/web-platform-tests/wpt.fyi/issues/1519. My suggested solution for that, spread out across comments, is to add a new wpt.fyi front page that has a widget with one click through the results table, where one can first select/unselect which browsers to show. I would then propose that the default selection is Chrome+Firefox+Safari, but that Edge, WebKitGTK and Deno are also listed and can be toggled with one click before clicking through to the results page.

It would be great if we don't have to resolve https://github.com/web-platform-tests/wpt.fyi/issues/1519 as part of this RFC, but I wanted to flag the issue, and don't want us to regress the turnaround time of getting new (aligned) results on wpt.fyi.

thejohnjansen commented 2 years ago

Thank you for calling out 1519. I agree they are very related, and I also agree we don't need to resolve 1519 with this RFC, but it is important to consider it. @jgraham I'm checking on what it would take for us to move to Linux on Azure instead of TaskCluster.

foolip commented 2 years ago

@thejohnjansen @mustjab are either of you on the Matrix chat, https://app.element.io/#/room/#wpt:matrix.org? Right now there's an issue with the Edge agent pool, with only one agent online, and it would be good to have a real-time-ish discussion about how to deal with it.

Either fixing the Windows runs or getting the Linux runs going will block reliable updates to the wpt.fyi front page, which is especially noticeable right now that Safari Technology Preview 135 is working.

thejohnjansen commented 2 years ago

I wonder what is involved in getting the Linux runs going. @jgraham, is this something you do on your end? I was looking at it from our end, and it's not clear to me what we actually need to do. It seems like we can just use the what you have for Mac on Linux. I also agree that running them daily makes sense.

foolip commented 2 years ago

@thejohnjansen the runs of Chrome, Firefox, Servo and WebKitGTK are done on Taskcluster, and are defined by https://github.com/web-platform-tests/wpt/blob/HEAD/tools/ci/tc/tasks/test.yml.

Instead running Edge on Azure Pipelines would involve adding something to https://github.com/web-platform-tests/wpt/blob/HEAD/.azure-pipelines.yml, perhaps most like results_safari_preview but probably not much could be reused since we've never run any browser on Linux on Azure Pipelines.

Regardless of the approach, getting it to work is a long process of trying things on the real CI systems until you get a full run, and then adding (copying) some bits that get the results uploaded to wpt.fyi.

foolip commented 2 years ago

@thejohnjansen do you still want to pursue this?

thejohnjansen commented 2 years ago

Sorry, this completely fell off my radar. Yes. I think it makes sense to not only pursue this, but in parallel to reduce the frequency of Edge runs. I lost track of what the next steps are to make this happen.

foolip commented 2 years ago

@thejohnjansen I think you'd have to set up Edge runs on Taskcluster, similar to how we run Chrome. Apart from a lot of copypastable code you might have to do something special for installing the right version of Edge.

The best way to experiment with this is by just pushing to a branch in this repo, with the Taskcluster config temporarily edited to trigger from that branch.

mustjab commented 2 years ago

@foolip Is there any wiki or doc on how to schedule Taskcluster runs with changes? I'm going to attempt to move Edge runs to Linux.

foolip commented 2 years ago

@mustjab I'm afraid we have only minimal documentation for how our CI setups work, some pretty good comments in the source and then https://web-platform-tests.org/running-tests/from-ci.html.

I would suggest adding the triggers/edge_* branches that are already in .azure-pipelines.yml to the Taskcluster setup in .taskcluster.yml, and then pushing to one of those branches to trigger a test run with your changes. But maybe name it triggers/edge_test or similar while trying things out, so you don't also trigger the runs on Azure Pipelines every time.

To make sure it has been considered before going ahead with Taskcluster, would it be an option to run Edge on the default Windows VMs that Azure Pipelines provides? I'm assuming that most of the maintenance burden comes from the custom agent pool with Windows 10 VMs. Back when this setup was created it wasn't possible to run Edge on the available default VMs, but maybe this has changed now?

mustjab commented 2 years ago

Trying Edge on Azure Pipelines sounds like a great idea and appears to be working fine there so far: https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=80546&view=results

I'll validate Canary and Dev channels to make sure they all look good and will submit a PR for that.

foolip commented 2 years ago

Awesome! If something like https://github.com/web-platform-tests/wpt/compare/user/mustjab/edge-runs-on-azure-pipelines works, that's great!

mustjab commented 2 years ago

Yes, looks like it's all working, did runs for Stable, Dev and Canary channels. Will send the change out for review, thanks for your help!

foolip commented 2 years ago

The run on the wpt.fyi front page is now from the new setup.

In https://github.com/web-platform-tests/wpt/pull/33755#issuecomment-1107402328 I asked about parallelism, a problem I didn't think about before merging that.

I also wonder if we really need to run both Canary and Dev every 3 hours? It seems like we should pick one of them and let the other run only daily or weekly.

gsnedders commented 2 years ago

So, as I understand it, the current status is:

This perhaps leaves this RFC somewhat abandoned; do we still care about moving Edge run Windows to Linux?

mustjab commented 2 years ago

Probably not, as long as we're able to run Edge on Windows in Microsoft-hosted pool.

foolip commented 2 years ago

Indeed, my intention was for Edge on Windows to obviate the need for this RFC.

In hindsight the change was still RFC worthy, because there was a risk (capacity) that I did not consider and is now an ongoing problem.