Closed jugglinmike closed 6 years ago
Do we have the resources to do a bisect over WPT commits in the neighbourhood of the ¼ mark in the first graph in a day or so? @jugglinmike do you think the results of such a bisect would be useful?
On Fri., Sep. 7, 2018, 9:41 p.m. jugglinmike, notifications@github.com wrote:
Over the past four days, the time to collect results has steadily increased. This project is no longer capable of collecting results at the normal cadence (4 times per day for Firefox and Chrome, once per day for Edge and Safari).
In the following chart, each data point describes a partial execution of WPT (called a "chunk"). The horizontal axis describes relative time, and the vertical access describes "chunk" execution duration in minutes. Each cluster represents a collection attempt (for Firefox and Chrome, we make four attempts per day):
[image: screenshot from 2018-09-07 18-28-37] https://user-images.githubusercontent.com/677252/45248967-27843c80-b2e6-11e8-842a-c45555a62cad.png
Firefox typically requires 45 minutes to execute a "chunk." This project responds to missing tests by re-trying collection attempts up to two additional times.
I have downloaded the logs from a handful of the outliers, and all of them exhibited the crash reported in gh-592 https://github.com/web-platform-tests/results-collection/issues/592. Part of the explanation for their extended execution time is that they are re-trying. I'm still not convinced of a correlation, though, because this delay has only become apparent over the past four days (and because some of those delays are far greater than the 200% increase we'd expect from two retries).
That chart shows how the delay has caused collection efforts to overlap--we are currently scheduling collections faster than we can execute them. This is also visible in the build history for Edge and Safari. We collect from these browsers once per 24 hours, and we do so using many more "chunks":
[image: screenshot from 2018-09-07 20-25-17] https://user-images.githubusercontent.com/677252/45248968-2b17c380-b2e6-11e8-8552-51211500fe68.png
As noted in gh-592 https://github.com/web-platform-tests/results-collection/issues/592#issuecomment-419603340, a patch has been filed for crash recovery. I am unable to merge that patch because it was submitted to a downstream project. Besides, it's now Friday evening in local time, so I'd be reluctant to perform the merge even if I could.
I'm hopeful that the patch will solve this problem, but given that we don't fully understand the underlying issue (e.g. what has changed in the past four days, why the crash is occurring in the first place), I'm not taking it for granted.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/web-platform-tests/results-collection/issues/602, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsWSP3_6fcuJLxdfpK_200SKBpqROa9ks5uYyBAgaJpZM4Wfqh3 .
First, the patch won't help us to reduce the execution time of the tests with Firefox. But it can help us to figure out what exactly crashes. Maybe this is one of the shutdown hangs we experience here, and the background thread monitor kills Firefox after the 60s max shutdown duration.
If you want to play around and you can reproduce on Travis with a PR maybe try to check the toolkit.asyncshutdown.crash_timeout
preference, and shorten the time until Firefox gets killed. If that makes tests faster then we have an indication.
@jugglinmike, do you know which tests exactly take that long?
Here's a graph with some more recent data:
As of this weekend, outliers are exceeding the expected build time by an order of magnitude. This is a familiar pattern--see gh-545. Checking the AWS CPU utilization graphs confirms this. Here is a graph detailing per-machine credit availability for the past week or so:
I suspect that gh-592 created a CPU credit deficit. Just like in gh-545, we were unaware of this initially because we generally maintain many credits on reserve. It wasn't until we spent this reserve that we began to witness the more general performance degradation described by this issue, one machine at a time. I was initially thrown off by Chrome's relatively stable performance. A problem like worker CPU throttling should effect Chrome and Firefox evenly. My guess is that Chrome's sub-30-minute build time makes it the least susceptible to reduced resources--it may even be that it can complete testing before spending the few credits we manage to accrue between builds.
If that's right, then resolving gh-592 will address the deficit, allow the system to recover CPU credits, and ultimately lead to build times we were enjoying at the end of August.
(@whimboo Since this problem appears to be a symptom of gh-592, I'll respond to your comment about the crashes in that thread.)
Do we have the resources to do a bisect over WPT commits in the neighbourhood of the ¼ mark in the first graph in a day or so? @jugglinmike do you think the results of such a bisect would be useful?
That could be useful. Unfortunately, we don't have the resources to run this kind of experiment (otherwise, I'd also suggest testing a few different releases of Nightly). I think for now, we have to put our hopes on the proposed workaround for the crash
I upgraded all workers to Ubuntu 18.04 on 2018-09-20, and as expected, we haven't experienced a Mir-related build error since then. Thanks for the tip, @jgraham!
Almost forgot: here's the commit which implements the upgrade: https://github.com/web-platform-tests/results-collection/commit/c5dcfc317f51630ad182eef5f4f1d0947f80c023
That's great to hear! So I assume all the extensive delays are gone now.
Over the past four days, the time to collect results has steadily increased. This project is no longer capable of collecting results at the normal cadence (4 times per day for Firefox and Chrome, once per day for Edge and Safari).
In the following chart, each data point describes a partial execution of WPT (called a "chunk"). The horizontal axis describes relative time, and the vertical access describes "chunk" execution duration in minutes. Each cluster represents a collection attempt (for Firefox and Chrome, we make four attempts per day):
Firefox typically requires 45 minutes to execute a "chunk." This project responds to missing tests by re-trying collection attempts up to two additional times.
I have downloaded the logs from a handful of the outliers, and all of them exhibited the crash reported in gh-592. Part of the explanation for their extended execution time is that they are re-trying. I'm still not convinced of a correlation, though, because this delay has only become apparent over the past four days (and because some of those delays are far greater than the 200% increase we'd expect from two retries).
That chart shows how the delay has caused collection efforts to overlap--we are currently scheduling collections faster than we can execute them. This is also visible in the build history for Edge and Safari. We collect from these browsers once per 24 hours, and we do so using many more "chunks":
As noted in gh-592, a patch has been filed for crash recovery. I am unable to merge that patch because it was submitted to a downstream project. Besides, it's now Friday evening in local time, so I'd be reluctant to perform the merge even if I could.
I'm hopeful that the patch will solve this problem, but given that we don't fully understand the underlying issue (e.g. what has changed in the past four days, why the crash is occurring in the first place), I'm not taking it for granted.