Results distribution - Githubissues

jugglinmike commented 6 years ago

During the March meetings in Waterloo, we worked out a new project organization for the collection and distribution of WPT test results data. Today, @hexcles presented some thoughts on an alternate model. I've also been thinking about the problem, so we're planning to speak synchronously on this coming Monday. I'm creating this issue to keep track of our initial thoughts and to communicate any decisions we make.

jugglinmike commented 6 years ago

Because @foolip made the mistake of complimenting my ASCII "art" in a recent presentation, I'm going to try to explain this with dashes and pipes.

For context, the current system has the test runner uploading to a storage service and notifying the dashboard UI of this via a small HTTP POST request:

.--------------.        .----------------.        .---------.
| Test runner  | -(1)-> | Storage:       |        | wpt.fyi |
|              |        | formatted data | <-(3)- |         |
'--------------'        '----------------' -(4)-> '---------'
       '-(2)-------------------------------------------^
1. Upload formatted results & summary
2. Notify (via authenticated POST)
3 & 4. Fetch

In the new system, the results collector would submit data directly to the dashboard system, and that project would handle storage internally:

.-------------.                              .---------+----------------.
| Test runner |                              | wpt.fyi | Storage:       |
|             |                              |         | formatted data |
'-------------'                              '---------+ and metrics    |
       '-(1)--------------------------------------^    '----------------'
1. Upload raw unformatted results

The benefit here is that there are fewer integration points between the projects, and that's important because we plan to split these into separate repositories.

I think we could improve this still further by using the data store as an intermediary between the two projects. The test runner uploads the complete dataset without processing. The dashboard downloads that data and processes it to meet its own needs (maintaining its own storage if necessary).

.-------------.        .-------------.        .---------+----------------.
| Test runner | -(1)-> | Storage:    | -(2)-> | wpt.fyi | Storage:       |
|             |        | raw data    | <-(3)- |         | formatted data |
'-------------'        '-------------' -(4)-> '---------+ and metrics    |
1. Upload raw unformatted results                       '................'
2. Notify (via Atom)
3 & 4. Fetch

This enables public consumption because anyone can subscribe to the syndication feed without our support.

The agreed-upon system reflects the careful planning of a bunch of people, and I don't mean to disrupt things with a late-stage alteration. If the idea holds water, we could make it a goal for some future iteration. I'm raising the issue now because it may actually be easier to implement that system today. Jumping directly from the first diagram to the third will require less change than passing through the second. We could make the change even less disruptive by preserving the "notify" POST request. The system would work just as it does today, and the transition path would be (1) implement an ATOM feed, followed by (2) detect changes via the feed and remove the notification request.

Hexcles commented 6 years ago

Thanks for filing the issue, Mike.

I think we have many consensus here:

Test runner shouldn't do any fancy processing of the results. I think the final product of the test runner should simply be a consolidated JSON report (i.e. if it does any sharding internally, it should merge the results itself), presumably something based on wptreport.
Test runner sends the result blob to an endpoint, which is responsible for processing and storage. (In my slides, I used wpt.fyi to specifically refer to the frontend/webapp, and named this result-receiving endpoint as a new service. There's nothing fundamentally different from your diagrams above I think.)

In my slides, the "simplest" design is similar to your second diagram. It's the "simplest" from the receiver's perspective, but you're right, it's not necessarily the simplest for existing runners (they must have stored the results somewhere already).

I'm also more inclined towards a resource-oriented approach, similar to your third diagram, which also avoids the size limit problem we may have if we run on AppEngline (either standard or flex).

I don't have any opinions yet regarding push (POST) vs poll (ATOM feed sub), and am glad to discuss that more.

P.S. the use of TaskQueue would be internal implementation details, if we decide to do so. And we probably won't need it as I suspect 1min should be enough.

Hexcles commented 6 years ago

BTW, @jugglinmike which tool do you use to make the ASCII flow charts? :)

foolip commented 6 years ago

Because @foolip made the mistake of complimenting my ASCII "art" in a recent presentation

I don't regret it, still loving your ASCII art :)

The third diagram is similar to the final slide of @Hexcles's presentation, where results are uploaded "somewhere".

The main complication that introduces is how long-lived that "somewhere" storage needs to be. If it is not fetched before the initial request is finished, then how does the receiver notify the runner that it's done? Or, if the resource can't be fetched or we reject it because it's not valid JSON/UTF-8/wptreport/etc., then how is that communicated?

That last problem exists as long as not all work is done synchronously when posting though, and in practice probably isn't a big deal. If we find that no results are showing up and it's because all runs are being rejected, we'll notice before long without any fancy protocol around it.

Hexcles commented 6 years ago

The idea is to set a long enough expiration (e.g. a week or month) for the results on the runner side, so that premature deletion won't be a problem in practice. I believe running infrastructures would have such a result storage & cleanup mechanism anyways (for debugging, etc.). @jugglinmike does your BuildBot infra have this?

The second question, i.e. returning the validation result to the runner, is definitely harder. Yet I think it might not be worthwhile/necessary from two perspectives:

From the perspective of implementation complexity, the runner would need to implement a callback if there's anything asynchronous in the pipeline (e.g. in the poll model, or if in the future we want to have some advanced validation that would take too long like checking against the manifest).
From the perspective of value, what can the runner do if it knows the result is invalid? Rerun? The runner would also be more complicated, and it needs to deal with more delicate scheduling issues (abort when it can't catch up?). I feel like proper monitoring & alerting is the way to go, which doesn't require notifying the runner.

jugglinmike commented 6 years ago

The main complication that introduces is how long-lived that "somewhere" storage needs to be.

I've been considering that storage to be the canonical location for WPT results data. In that case, the data should be persisted indefinitely, and the reporter may remove its copy immediately following successful upload.

The idea is to set a long enough expiration (e.g. a week or month) for the results on the runner side, so that premature deletion won't be a problem in practice. I believe running infrastructures would have such a result storage & cleanup mechanism anyways (for debugging, etc.). @jugglinmike does your BuildBot infra have this?

Yes, we could define a new scheduled task to remove files in short order. I have been thinking about this problem slightly differently, though, so I'm not sure if it will be necessary (see above).

Hexcles commented 6 years ago

One thing I feel a bit uneasy about is to give runners write access to the canonical storage. We could set permissions properly to prevent deletion and modification, but we still have to write a scanner to remove invalid/orphaned results periodically.

On a second thought, we may provide a "canonical" storage bucket for runners to directly write to, but wpt.fyi doesn't directly serve from that bucket. Instead, we copy verified results (and do some transformation) to the private bucket that backs wpt.fyi.

On Fri, Mar 30, 2018, 13:44 jugglinmike notifications@github.com wrote:

The main complication that introduces is how long-lived that "somewhere" storage needs to be.

I've been considering that storage to be the canonical location for WPT results data. In that case, the data should be persisted indefinitely, and the reporter may remove its copy immediately following successful upload.

The idea is to set a long enough expiration (e.g. a week or month) for the results on the runner side, so that premature deletion won't be a problem in practice. I believe running infrastructures would have such a result storage & cleanup mechanism anyways (for debugging, etc.). @jugglinmike https://github.com/jugglinmike does your BuildBot infra have this?

Yes, we could define a new scheduled task to remove files in short order. I have been thinking about this problem slightly differently, though, so I'm not sure if it will be necessary (see above).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wptdashboard/issues/527#issuecomment-377580085, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXsjHYqmj5ULeJ4WHrRvlJ1Qok1MfM6ks5tjm74gaJpZM4S9XNS .

jugglinmike commented 6 years ago

That sounds good to me! Today, we only have one data source (the test runner in the run/ subdirectory) and one consumer (wpt.fyi). When we have more sources of data, then we could move the validation to wpt.fyi. The problem there is that if someone else wishes to consume the data, they will have to also perform the same validation. It might be better to upgrade the storage bucket to a true web application.

If you agree, then this may be a good opportunity to be strategic with how we prioritize work. Instead of re-locating the validation logic today (and expecting to re-locate it again in the future), maybe wpt.fyi can continue to trust all uploaded data since we know it's being validated prior to being uploaded.

Hexcles commented 6 years ago

It's more about the transformation/marshalling for the frontend to consume than validation. We already have some internal/third-party experimental running infra that's not using run.py and we would like to make their lives easier by accepting one single JSON blob. This would also enable us to explore more efficient data models than the current summary + per-test JSON stubs.

I'm thinking the runners can either directly upload to our public bucket and post to our API with the resource URL, or they can embed the gzipped JSON file in POST (multipart) and the API will store the file to the bucket for them. I already wrote an AppEngine prototype for this.

In both approaches, the API triggers an internal service asynchronously to process the JSON (runners don't need to know about this).

On Fri, Mar 30, 2018, 15:26 jugglinmike notifications@github.com wrote:

That sounds good to me! Today, we only have one data source (the test runner in the run/ subdirectory) and one consumer (wpt.fyi). When we have more sources of data, then we could move the validation to wpt.fyi. The problem there is that if someone else wishes to consume the data, they will have to also perform the same validation. It might be better to upgrade the storage bucket to a true web application.

If you agree, then this may be a good opportunity to be strategic with how we prioritize work. Instead of re-locating the validation logic today (and expecting to re-locate it again in the future), maybe wpt.fyi can continue to trust all uploaded data since we know it's being validated prior to being uploaded.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wptdashboard/issues/527#issuecomment-377602594, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXsjLJ5xKprHrLBZGCTCVWj3kyeiM0Sks5tjobXgaJpZM4S9XNS .

web-platform-tests / results-collection

Results distribution #527