Closed foolip closed 6 years ago
I think Luke's explanation is correct. Due to the way the results have been collected historically, we're limited to inspecting the entities in the datastore. I made a script to do that and included the results below, but basically, it looks like the last occurrence of duplicated entries occurred on March 20. That is the same day I dismantled the old results collection infrastructure and switched to the new infrastructure powered by Buildbot.
That isn't a complete solution, though. Buildbot is currently configued to run the tests even in the absense of new changes. Any six-hour interval without commits will wind up producing results for the same revision of WPT. At least, I would expect it to. Reviewing the history on builds.wpt.fyi suggests we've hit a bug in Buildbot: there were four failing builds from the "Uploader" builder over the weekend. This seems to correspond to the period where WPT did not receive any new commits, but Buildbot is reporting all the builds as taking place at the same time.
In any event, the underlying problem is due to limitations in the way results are reported and stored. Since we're already discussing improvements to that protocol, I think it makes sense to accept this as a known bug for the time being and resolve as part of that larger effort.
'use strict';
const sorted = data.reduce((accumulator, summary) => {
const id = summary.revision + '-' + summary.browser_name;
if (!(id in accumulator)) {
accumulator[id] = [];
}
accumulator[id].push(summary);
return accumulator;
}, {});
Object.entries(sorted).forEach(([id, summaries]) => {
if (summaries.length === 1) {
return;
}
console.log(id);
summaries.forEach(summary => {
console.log(' ' + summary.created_at);
});
});
234faac690-chrome
2018-03-20T20:35:08.006927Z
2018-03-20T14:11:20.466899Z
2018-03-20T07:54:46.169983Z
790e6601ee-chrome
2018-03-20T01:39:55.160385Z
2018-03-19T19:24:34.309429Z
2018-03-19T13:10:33.251512Z
2018-03-19T06:56:01.557943Z
e87f380979-chrome
2018-03-19T00:41:10.407798Z
2018-03-18T18:27:39.014821Z
2018-03-18T12:13:40.04175Z
ecd2c46c1a-chrome
2018-03-18T05:59:56.888268Z
2018-03-17T23:44:18.780173Z
2018-03-17T17:29:45.764151Z
2018-03-17T11:15:21.446716Z
cf26d057b8-chrome
2018-03-03T16:04:47.159888Z
2018-03-03T09:54:26.282377Z
b0ff0ea414-chrome
2018-02-16T19:59:03.313794Z
2018-02-16T13:51:43.594744Z
53c5bf648c-chrome
2018-01-18T20:35:38.53Z
2018-01-09T15:47:03.949Z
96417067d9-chrome
2017-12-07T15:21:19.669Z
2017-12-06T15:56:36.873Z
de6ce4a47f-chrome
2017-10-01T14:22:19.483777Z
2017-09-30T14:26:23.665782Z
79b10e653a-chrome
2017-09-24T14:20:49.687839Z
2017-09-23T14:11:46.117741Z
2bbabd7c68-chrome
2017-09-17T14:06:18.105207Z
2017-09-16T14:00:35.440136Z
13eaad17a4-edge
2018-01-13T19:21:20.165711Z
2018-01-12T09:21:38.793147Z
2018-01-10T00:37:58.637638Z
2018-01-05T13:42:53.957195Z
2018-01-02T01:17:27.949933Z
2017-12-25T12:20:34.77382Z
2017-12-23T11:38:26.39933Z
2017-12-22T17:07:37.169866Z
2017-12-20T14:10:32.172Z
79b10e653a-edge
2017-09-25T04:42:53.684724Z
2017-09-24T03:59:17.627344Z
87427523dc-edge
2017-09-04T01:25:13.604554Z
2017-09-03T06:04:55.725888Z
e511e5e8af-edge
2017-08-28T02:27:52.125528Z
2017-08-27T03:15:47.837214Z
b12daf6ead-edge
2017-08-06T10:10:21.507705Z
2017-08-03T05:39:08.047372Z
2017-08-02T07:13:13.201937Z
790e6601ee-firefox
2018-03-20T05:24:52.474477Z
2018-03-19T18:44:52.261379Z
e87f380979-firefox
2018-03-19T08:20:23.634781Z
2018-03-18T21:55:03.967534Z
2018-03-18T11:32:28.875007Z
ecd2c46c1a-firefox
2018-03-18T01:11:52.867361Z
2018-03-17T14:49:13.022428Z
05d6e35a43-firefox
2018-03-07T04:05:29.989419Z
2018-03-06T17:23:35.546448Z
bad5fb3923-firefox
2018-03-06T06:47:06.878593Z
2018-03-05T20:13:02.401257Z
325b754702-firefox
2018-03-05T09:42:00.215025Z
2018-03-04T23:10:45.163896Z
2018-03-04T12:39:52.553113Z
32ede3f7a9-firefox
2018-03-03T05:13:01.878182Z
2018-03-02T18:43:30.765593Z
a200a0fb41-firefox
2018-02-28T22:47:57.531412Z
2018-02-28T11:18:01.529608Z
7bfbc0fa30-firefox
2018-02-24T05:02:08.652387Z
2018-02-23T18:41:35.583039Z
08dd5d8de9-firefox
2018-02-19T18:59:55.931956Z
2018-02-18T10:43:24.616343Z
1dfa574650-firefox
2017-11-06T15:39:25.914721Z
2017-11-05T15:38:43.267639Z
2017-11-04T15:38:46.743601Z
2017-11-03T15:40:05.5766Z
2017-11-02T15:45:24.574636Z
2017-11-01T15:35:21.899551Z
2017-10-31T17:09:41.290914Z
2017-10-30T16:23:38.814921Z
2017-10-29T16:22:08.511063Z
2017-10-28T16:32:28.061334Z
2017-10-27T17:26:38.930173Z
2017-10-26T17:25:12.28557Z
2017-10-25T17:31:15.311711Z
2017-10-24T18:10:23.529307Z
2017-10-23T17:50:06.045987Z
2017-10-22T17:44:01.72594Z
0e906f0595-firefox
2017-10-16T00:51:28.988292Z
2017-10-15T15:42:32.671414Z
790e6601ee-safari
2018-03-20T15:13:51.417179Z
2018-03-19T20:59:17.960845Z
b73f249d95-safari
2018-03-17T13:43:28.696834Z
2018-03-16T23:43:31.274087Z
311196cda0-safari
2018-03-10T05:18:32.297657Z
2018-03-09T17:04:29.98378Z
05d6e35a43-safari
2018-03-07T09:12:43.689498Z
2018-03-06T22:51:50.494678Z
325b754702-safari
2018-03-05T12:19:39.945694Z
2018-03-04T19:55:49.520734Z
e3c513e9ef-safari
2017-10-12T23:14:39.211156Z
2017-10-09T02:00:52.186385Z
de6ce4a47f-safari
2017-10-12T23:02:57.790694Z
2017-09-30T11:17:10.248243Z
79b10e653a-safari
2017-09-24T11:27:52.697092Z
2017-09-23T11:29:21.905582Z
87427523dc-safari
2017-09-03T11:08:59.860093Z
2017-09-02T11:02:31.613754Z
e511e5e8af-safari
2017-08-28T11:03:24.059946Z
2017-08-27T12:16:20.66681Z
2017-08-26T10:55:14.452634Z
For the existing dupes, will the earliest result or the latest one be the best estimate of when complete results were done?
I can't think of a reason to prefer any result set over another. Optimistically, you could say, "the very first result set to be reported reflects when the results were available," but this assumes that all of the result sets are complete. We know this was not always the case.
This extends beyond the duplicates, though, because the question of completeness is relevant even when there was only a single result set for a given browser/revision combination.
Are some of these cases where you've deliberate rerun to get more complete results, or all they all just "accidental" reruns due to lack of any effort to not rerun?
The latter. For context:
There was no formal scheduling implemented in the previous results collection solution. Four machines were running simultaneously (one dedicated to each of the browsers we were testing). Within each machine, once the results collection process ended, a new one would be initiated immediately. Each time, the run.py
script would check out the final commit of the previous day. Under that system, we would collect multiple result sets for the same browser-revision pair any time a collection process started and ended before a new commit was available in the latest 24-hour window.
Also note that about half of the duplicated entities pre-date my involvement in the project, so I can't comment on how those were created.
https://wpt.fyi/api/runs?max-count=100 currently has:
It's not Chrome-specific, there are multiple runs for the same revision for Firefox as well.
@lukebjerring hints at the cause: "the GCS will overwrite, but the datastore TestRun won't, they'll duplicate"