Open foolip opened 5 years ago
The only thing that I can see fixing this problem well is to pin all of our dependencies and have automatically created PRs that upgrade them, similar to how we're upgrade Python packages currently, and tooling I created for updating Safari in https://github.com/foolip/safari-technology-preview-updater.
@jgraham we've discussed a related problem of raciness due to different shards installing different versions, and you suggested we fix that at the CI level. The problem in this issue, however, I don't think can be solved in that way.
Depending on how we fix it, a side effect could be fixing https://github.com/web-platform-tests/wpt/issues/13274.
@jugglinmike I know you've been keeping a bucket of browser and driver binaries for the Buildbot setup. Is that bucket accessible somehow? If we make it accessible only to people with write access to our repo (and CI systems) then maybe we can build on that for all platforms/browsers?
I think this is mostly a short-term problem whilst Chromedriver is undergoing heavy development. In general the number of infrastructure tests that should not pass is 0 and once we reach that goal I would hope we don't regress it again.
Every time we add some new test automation capability to the web platform, tests will initially be failing and then turn passing over time. Every individual FooDriver release that changes the results will cause this to happen.
If we do just allow the tests to begin failing, how do we effectively notice that they are failing, and who is on the hook to fix them? Fixes need to happen very quickly if it's not to inconvenience others working on unrelated things.
Yes, it's true that new automation capabilities may not initially be available everywhere. I just think that there's a real cost to having to upgrade browsers and drivers manually, both in terms of the engineering required to pin them and autocreate PRs for updates, and in terms of actually handling those PRs. In theory for firefox nightly we would get two such PRs a day.
I think this kind of thing happens rarely enough that I'm happy to be on the hook to fix it. What we could do is provide useful output (e.g. in the checks view) when we detect infrastructure tests are broken like "this PR fails infrastructure tests. If you haven't changed any test infrastructure, please file an issue and mark it priority:critical to get the failures investigated".
That sounds reasonable, we'd get some signal about how big of a problem it is too. For Azure Pipelines, it is possible to get stuff into the checks view by printing it to stderr, so we could print something on failure only. Is there a way to do something similar for Taskcluster?
The Taskcluster checks view integration is a work in progress, so I don't know exactly what the API will be. https://bugzilla.mozilla.org/show_bug.cgi?id=1459645 is the bug if you want to follow along.
@jugglinmike I know you've been keeping a bucket of browser and driver binaries for the Buildbot setup. Is that bucket accessible somehow? If we make it accessible only to people with write access to our repo (and CI systems) then maybe we can build on that for all platforms/browsers?
@foolip It is publicly accessible, but it's not structured for external use. The binaries are identified by the e-tag value with which they are served from Google/Mozilla/Apple. For example:
https://storage.googleapis.com/browsers/firefox-stable-linux/4a78aeedecebda151799c82c49ec46e4
Each binary has a corresponding metadata file, but that's really only stored for debugging purposes:
https://storage.googleapis.com/browsers/firefox-stable-linux/4a78aeedecebda151799c82c49ec46e4.json
There's no way to query for any particular release except by e-tag value.
Cool! Some minor iteration on that scheme should make it possible for some auto-upgrader to both put new binaries in there, and update metadata in wpt to point to them in a PR.
However, let's wait to see how often it breaks, if @jgraham no longer wants to be on the hook to fix it :)
https://github.com/web-platform-tests/wpt/pull/14570 is a case where a change has been made to ChromeDriver and will soon enough cause infrastructure/ tests to break, but there's not really a way to prepare for it other than skipping the tests and then remembering to enable them later.
It seems to have happened in https://github.com/web-platform-tests/wpt/pull/14033 now.
Oh, that's because https://github.com/web-platform-tests/wpt/pull/14570 was merged as part of https://github.com/web-platform-tests/wpt/issues/14499, as there was a brief period of ~15 minutes where old Travis wasn't required and new Travis also wasn't.
Hmm, this isn't very actionable, I'll close it until it's happened more times and I have a real suggestion.
Seems like I didn't close this when I claimed I would.
In any case this keeps happening quite often. Latest is https://github.com/web-platform-tests/wpt/pull/18830 where Chrome on macOS is broken. Unfortunately we don't have a way of pinning the version of Chrome/ChromeDriver that we could easily deploy.
Breakage keeps happening for Chrome on Taskcluster. https://github.com/web-platform-tests/wpt/issues/31714 is the most recent, earlier this year it was https://github.com/web-platform-tests/wpt/issues/28209.
I believe that https://github.com/web-platform-tests/wpt/issues/28970 would be a first step towards solving this, since pinning Chromium is possible, but pinning Chrome would require us to store the binaries somewhere.
This was reported again by @jonathan-j-lee in https://github.com/web-platform-tests/wpt/issues/37092:
infrastructure/
runs against nightly builds of each browser (i.e., are not pinned to a version). As browsers regress or receive bug fixes, the expectations stored underinfrastructure/metadata/
can become stale. This can cause unexpected results on the next CI run and block unrelated changes.Examples:
- Expect
<video> autoplay
to pass in Safari #37038infrastructure/server/webtransport-h3.https.sub.any.js
test broken for Chome Dev on macOS #36504- Expect
OK
forwheelScroll.html
on Safari/WebKit #35975cc @WeizhongX
@jgraham replied:
I don't have a proposal for something better here. The obvious alternative would be building out infrastructure to allow pinning to specific versions of browsers and then creating PRs to update the browser version. But my guess is whilst that's going to look more professional ("other infra PRs less likely to be blocked on browser issues"), it's actually going to be more work overall to build the infrastructure and approve the updates. On the other hand if you have the bandwidth to do the work, and a way to make the updates almost seamless, then I don't object. I just wouldn't prioritise it myself.
I would also support pinning of all browsers/drivers.
cc @past
In https://github.com/web-platform-tests/wpt/pull/14454 I wanted to change the logging for infrastructure/ tests, when I noticed that the infrastructure/ tests were failing for Chrome Dev. This is presumably because of some recent change in ChromeDriver, and requires updating infrastructure/metadata/ to make the tests pass.
Because we simply install the latest browser and driver, this is prone to happen at any time, and would affect any PR that happens to touch directories that trigger these tests.
@jgraham @jugglinmike FYI