infrastructure/ tests will break as browsers/drivers update, blocking unrelated PRs

foolip commented 5 years ago

In https://github.com/web-platform-tests/wpt/pull/14454 I wanted to change the logging for infrastructure/ tests, when I noticed that the infrastructure/ tests were failing for Chrome Dev. This is presumably because of some recent change in ChromeDriver, and requires updating infrastructure/metadata/ to make the tests pass.

Because we simply install the latest browser and driver, this is prone to happen at any time, and would affect any PR that happens to touch directories that trigger these tests.

@jgraham @jugglinmike FYI

foolip commented 5 years ago

The only thing that I can see fixing this problem well is to pin all of our dependencies and have automatically created PRs that upgrade them, similar to how we're upgrade Python packages currently, and tooling I created for updating Safari in https://github.com/foolip/safari-technology-preview-updater.

@jgraham we've discussed a related problem of raciness due to different shards installing different versions, and you suggested we fix that at the CI level. The problem in this issue, however, I don't think can be solved in that way.

Depending on how we fix it, a side effect could be fixing https://github.com/web-platform-tests/wpt/issues/13274.

foolip commented 5 years ago

@jugglinmike I know you've been keeping a bucket of browser and driver binaries for the Buildbot setup. Is that bucket accessible somehow? If we make it accessible only to people with write access to our repo (and CI systems) then maybe we can build on that for all platforms/browsers?

jgraham commented 5 years ago

I think this is mostly a short-term problem whilst Chromedriver is undergoing heavy development. In general the number of infrastructure tests that should not pass is 0 and once we reach that goal I would hope we don't regress it again.

foolip commented 5 years ago

Every time we add some new test automation capability to the web platform, tests will initially be failing and then turn passing over time. Every individual FooDriver release that changes the results will cause this to happen.

foolip commented 5 years ago

If we do just allow the tests to begin failing, how do we effectively notice that they are failing, and who is on the hook to fix them? Fixes need to happen very quickly if it's not to inconvenience others working on unrelated things.

jgraham commented 5 years ago

Yes, it's true that new automation capabilities may not initially be available everywhere. I just think that there's a real cost to having to upgrade browsers and drivers manually, both in terms of the engineering required to pin them and autocreate PRs for updates, and in terms of actually handling those PRs. In theory for firefox nightly we would get two such PRs a day.

I think this kind of thing happens rarely enough that I'm happy to be on the hook to fix it. What we could do is provide useful output (e.g. in the checks view) when we detect infrastructure tests are broken like "this PR fails infrastructure tests. If you haven't changed any test infrastructure, please file an issue and mark it priority:critical to get the failures investigated".

foolip commented 5 years ago

That sounds reasonable, we'd get some signal about how big of a problem it is too. For Azure Pipelines, it is possible to get stuff into the checks view by printing it to stderr, so we could print something on failure only. Is there a way to do something similar for Taskcluster?

jgraham commented 5 years ago

The Taskcluster checks view integration is a work in progress, so I don't know exactly what the API will be. https://bugzilla.mozilla.org/show_bug.cgi?id=1459645 is the bug if you want to follow along.

jugglinmike commented 5 years ago

@jugglinmike I know you've been keeping a bucket of browser and driver binaries for the Buildbot setup. Is that bucket accessible somehow? If we make it accessible only to people with write access to our repo (and CI systems) then maybe we can build on that for all platforms/browsers?

@foolip It is publicly accessible, but it's not structured for external use. The binaries are identified by the e-tag value with which they are served from Google/Mozilla/Apple. For example:

https://storage.googleapis.com/browsers/firefox-stable-linux/4a78aeedecebda151799c82c49ec46e4

Each binary has a corresponding metadata file, but that's really only stored for debugging purposes:

https://storage.googleapis.com/browsers/firefox-stable-linux/4a78aeedecebda151799c82c49ec46e4.json

There's no way to query for any particular release except by e-tag value.

foolip commented 5 years ago

Cool! Some minor iteration on that scheme should make it possible for some auto-upgrader to both put new binaries in there, and update metadata in wpt to point to them in a PR.

However, let's wait to see how often it breaks, if @jgraham no longer wants to be on the hook to fix it :)

foolip commented 5 years ago

https://github.com/web-platform-tests/wpt/pull/14570 is a case where a change has been made to ChromeDriver and will soon enough cause infrastructure/ tests to break, but there's not really a way to prepare for it other than skipping the tests and then remembering to enable them later.

foolip commented 5 years ago

It seems to have happened in https://github.com/web-platform-tests/wpt/pull/14033 now.

foolip commented 5 years ago

Oh, that's because https://github.com/web-platform-tests/wpt/pull/14570 was merged as part of https://github.com/web-platform-tests/wpt/issues/14499, as there was a brief period of ~15 minutes where old Travis wasn't required and new Travis also wasn't.

foolip commented 5 years ago

Hmm, this isn't very actionable, I'll close it until it's happened more times and I have a real suggestion.

foolip commented 5 years ago

Seems like I didn't close this when I claimed I would.

In any case this keeps happening quite often. Latest is https://github.com/web-platform-tests/wpt/pull/18830 where Chrome on macOS is broken. Unfortunately we don't have a way of pinning the version of Chrome/ChromeDriver that we could easily deploy.

foolip commented 2 years ago

Breakage keeps happening for Chrome on Taskcluster. https://github.com/web-platform-tests/wpt/issues/31714 is the most recent, earlier this year it was https://github.com/web-platform-tests/wpt/issues/28209.

I believe that https://github.com/web-platform-tests/wpt/issues/28970 would be a first step towards solving this, since pinning Chromium is possible, but pinning Chrome would require us to store the binaries somewhere.

foolip commented 1 year ago

This was reported again by @jonathan-j-lee in https://github.com/web-platform-tests/wpt/issues/37092:

infrastructure/ runs against nightly builds of each browser (i.e., are not pinned to a version). As browsers regress or receive bug fixes, the expectations stored under infrastructure/metadata/ can become stale. This can cause unexpected results on the next CI run and block unrelated changes.

Examples:

Expect <video> autoplay to pass in Safari #37038

infrastructure/server/webtransport-h3.https.sub.any.js test broken for Chome Dev on macOS #36504

Expect OK for wheelScroll.html on Safari/WebKit #35975

cc @WeizhongX

@jgraham replied:

I don't have a proposal for something better here. The obvious alternative would be building out infrastructure to allow pinning to specific versions of browsers and then creating PRs to update the browser version. But my guess is whilst that's going to look more professional ("other infra PRs less likely to be blocked on browser issues"), it's actually going to be more work overall to build the infrastructure and approve the updates. On the other hand if you have the bandwidth to do the work, and a way to make the updates almost seamless, then I don't object. I just wouldn't prioritise it myself.

I would also support pinning of all browsers/drivers.

cc @past

web-platform-tests / wpt

infrastructure/ tests will break as browsers/drivers update, blocking unrelated PRs #14456