web-platform-tests / results-collection

Other
41 stars 46 forks source link

https://wpt.fyi/api/runs?max-count=100 has duplicate entries for some runs #528

Closed foolip closed 6 years ago

foolip commented 6 years ago

https://wpt.fyi/api/runs?max-count=100 currently has:

{"browser_name":"chrome","browser_version":"64.0","os_name":"linux","os_version":"4.4","revision":"790e6601ee","results_url":"https://storage.googleapis.com/wptd/790e6601ee/chrome-64.0-linux-summary.json.gz","created_at":"2018-03-20T01:39:55.160385Z"},
{"browser_name":"chrome","browser_version":"64.0","os_name":"linux","os_version":"4.4","revision":"790e6601ee","results_url":"https://storage.googleapis.com/wptd/790e6601ee/chrome-64.0-linux-summary.json.gz","created_at":"2018-03-19T19:24:34.309429Z"},
{"browser_name":"chrome","browser_version":"64.0","os_name":"linux","os_version":"4.4","revision":"790e6601ee","results_url":"https://storage.googleapis.com/wptd/790e6601ee/chrome-64.0-linux-summary.json.gz","created_at":"2018-03-19T13:10:33.251512Z"},
{"browser_name":"chrome","browser_version":"64.0","os_name":"linux","os_version":"4.4","revision":"790e6601ee","results_url":"https://storage.googleapis.com/wptd/790e6601ee/chrome-64.0-linux-summary.json.gz","created_at":"2018-03-19T06:56:01.557943Z"},

It's not Chrome-specific, there are multiple runs for the same revision for Firefox as well.

@lukebjerring hints at the cause: "the GCS will overwrite, but the datastore TestRun won't, they'll duplicate"

jugglinmike commented 6 years ago

I think Luke's explanation is correct. Due to the way the results have been collected historically, we're limited to inspecting the entities in the datastore. I made a script to do that and included the results below, but basically, it looks like the last occurrence of duplicated entries occurred on March 20. That is the same day I dismantled the old results collection infrastructure and switched to the new infrastructure powered by Buildbot.

That isn't a complete solution, though. Buildbot is currently configued to run the tests even in the absense of new changes. Any six-hour interval without commits will wind up producing results for the same revision of WPT. At least, I would expect it to. Reviewing the history on builds.wpt.fyi suggests we've hit a bug in Buildbot: there were four failing builds from the "Uploader" builder over the weekend. This seems to correspond to the period where WPT did not receive any new commits, but Buildbot is reporting all the builds as taking place at the same time.

In any event, the underlying problem is due to limitations in the way results are reported and stored. Since we're already discussing improvements to that protocol, I think it makes sense to accept this as a known bug for the time being and resolve as part of that larger effort.

'use strict';
const sorted = data.reduce((accumulator, summary) => {
  const id = summary.revision + '-' + summary.browser_name;

  if (!(id in accumulator)) {
    accumulator[id] = [];
  }
  accumulator[id].push(summary);

  return accumulator;
}, {});

Object.entries(sorted).forEach(([id, summaries]) => {
  if (summaries.length === 1) {
    return;
  }
  console.log(id);
  summaries.forEach(summary => {
    console.log('  ' + summary.created_at);
  });
});
234faac690-chrome
  2018-03-20T20:35:08.006927Z
  2018-03-20T14:11:20.466899Z
  2018-03-20T07:54:46.169983Z
790e6601ee-chrome
  2018-03-20T01:39:55.160385Z
  2018-03-19T19:24:34.309429Z
  2018-03-19T13:10:33.251512Z
  2018-03-19T06:56:01.557943Z
e87f380979-chrome
  2018-03-19T00:41:10.407798Z
  2018-03-18T18:27:39.014821Z
  2018-03-18T12:13:40.04175Z
ecd2c46c1a-chrome
  2018-03-18T05:59:56.888268Z
  2018-03-17T23:44:18.780173Z
  2018-03-17T17:29:45.764151Z
  2018-03-17T11:15:21.446716Z
cf26d057b8-chrome
  2018-03-03T16:04:47.159888Z
  2018-03-03T09:54:26.282377Z
b0ff0ea414-chrome
  2018-02-16T19:59:03.313794Z
  2018-02-16T13:51:43.594744Z
53c5bf648c-chrome
  2018-01-18T20:35:38.53Z
  2018-01-09T15:47:03.949Z
96417067d9-chrome
  2017-12-07T15:21:19.669Z
  2017-12-06T15:56:36.873Z
de6ce4a47f-chrome
  2017-10-01T14:22:19.483777Z
  2017-09-30T14:26:23.665782Z
79b10e653a-chrome
  2017-09-24T14:20:49.687839Z
  2017-09-23T14:11:46.117741Z
2bbabd7c68-chrome
  2017-09-17T14:06:18.105207Z
  2017-09-16T14:00:35.440136Z
13eaad17a4-edge
  2018-01-13T19:21:20.165711Z
  2018-01-12T09:21:38.793147Z
  2018-01-10T00:37:58.637638Z
  2018-01-05T13:42:53.957195Z
  2018-01-02T01:17:27.949933Z
  2017-12-25T12:20:34.77382Z
  2017-12-23T11:38:26.39933Z
  2017-12-22T17:07:37.169866Z
  2017-12-20T14:10:32.172Z
79b10e653a-edge
  2017-09-25T04:42:53.684724Z
  2017-09-24T03:59:17.627344Z
87427523dc-edge
  2017-09-04T01:25:13.604554Z
  2017-09-03T06:04:55.725888Z
e511e5e8af-edge
  2017-08-28T02:27:52.125528Z
  2017-08-27T03:15:47.837214Z
b12daf6ead-edge
  2017-08-06T10:10:21.507705Z
  2017-08-03T05:39:08.047372Z
  2017-08-02T07:13:13.201937Z
790e6601ee-firefox
  2018-03-20T05:24:52.474477Z
  2018-03-19T18:44:52.261379Z
e87f380979-firefox
  2018-03-19T08:20:23.634781Z
  2018-03-18T21:55:03.967534Z
  2018-03-18T11:32:28.875007Z
ecd2c46c1a-firefox
  2018-03-18T01:11:52.867361Z
  2018-03-17T14:49:13.022428Z
05d6e35a43-firefox
  2018-03-07T04:05:29.989419Z
  2018-03-06T17:23:35.546448Z
bad5fb3923-firefox
  2018-03-06T06:47:06.878593Z
  2018-03-05T20:13:02.401257Z
325b754702-firefox
  2018-03-05T09:42:00.215025Z
  2018-03-04T23:10:45.163896Z
  2018-03-04T12:39:52.553113Z
32ede3f7a9-firefox
  2018-03-03T05:13:01.878182Z
  2018-03-02T18:43:30.765593Z
a200a0fb41-firefox
  2018-02-28T22:47:57.531412Z
  2018-02-28T11:18:01.529608Z
7bfbc0fa30-firefox
  2018-02-24T05:02:08.652387Z
  2018-02-23T18:41:35.583039Z
08dd5d8de9-firefox
  2018-02-19T18:59:55.931956Z
  2018-02-18T10:43:24.616343Z
1dfa574650-firefox
  2017-11-06T15:39:25.914721Z
  2017-11-05T15:38:43.267639Z
  2017-11-04T15:38:46.743601Z
  2017-11-03T15:40:05.5766Z
  2017-11-02T15:45:24.574636Z
  2017-11-01T15:35:21.899551Z
  2017-10-31T17:09:41.290914Z
  2017-10-30T16:23:38.814921Z
  2017-10-29T16:22:08.511063Z
  2017-10-28T16:32:28.061334Z
  2017-10-27T17:26:38.930173Z
  2017-10-26T17:25:12.28557Z
  2017-10-25T17:31:15.311711Z
  2017-10-24T18:10:23.529307Z
  2017-10-23T17:50:06.045987Z
  2017-10-22T17:44:01.72594Z
0e906f0595-firefox
  2017-10-16T00:51:28.988292Z
  2017-10-15T15:42:32.671414Z
790e6601ee-safari
  2018-03-20T15:13:51.417179Z
  2018-03-19T20:59:17.960845Z
b73f249d95-safari
  2018-03-17T13:43:28.696834Z
  2018-03-16T23:43:31.274087Z
311196cda0-safari
  2018-03-10T05:18:32.297657Z
  2018-03-09T17:04:29.98378Z
05d6e35a43-safari
  2018-03-07T09:12:43.689498Z
  2018-03-06T22:51:50.494678Z
325b754702-safari
  2018-03-05T12:19:39.945694Z
  2018-03-04T19:55:49.520734Z
e3c513e9ef-safari
  2017-10-12T23:14:39.211156Z
  2017-10-09T02:00:52.186385Z
de6ce4a47f-safari
  2017-10-12T23:02:57.790694Z
  2017-09-30T11:17:10.248243Z
79b10e653a-safari
  2017-09-24T11:27:52.697092Z
  2017-09-23T11:29:21.905582Z
87427523dc-safari
  2017-09-03T11:08:59.860093Z
  2017-09-02T11:02:31.613754Z
e511e5e8af-safari
  2017-08-28T11:03:24.059946Z
  2017-08-27T12:16:20.66681Z
  2017-08-26T10:55:14.452634Z
foolip commented 6 years ago

For the existing dupes, will the earliest result or the latest one be the best estimate of when complete results were done?

jugglinmike commented 6 years ago

I can't think of a reason to prefer any result set over another. Optimistically, you could say, "the very first result set to be reported reflects when the results were available," but this assumes that all of the result sets are complete. We know this was not always the case.

This extends beyond the duplicates, though, because the question of completeness is relevant even when there was only a single result set for a given browser/revision combination.

foolip commented 6 years ago

Are some of these cases where you've deliberate rerun to get more complete results, or all they all just "accidental" reruns due to lack of any effort to not rerun?

jugglinmike commented 6 years ago

The latter. For context:

There was no formal scheduling implemented in the previous results collection solution. Four machines were running simultaneously (one dedicated to each of the browsers we were testing). Within each machine, once the results collection process ended, a new one would be initiated immediately. Each time, the run.py script would check out the final commit of the previous day. Under that system, we would collect multiple result sets for the same browser-revision pair any time a collection process started and ended before a new commit was available in the latest 24-hour window.

Also note that about half of the duplicated entities pre-date my involvement in the project, so I can't comment on how those were created.

foolip commented 6 years ago

Moved to https://github.com/web-platform-tests/wpt.fyi/issues/54.