Fetch aggregation data from Caesar

rogerhutchings commented 7 years ago

This would fetch data for aggregated points generated by @CKrawczyk's new aggregation engine, delivered by Caesar. This replaces the existing aggregation data provided via the API aggregations endpoint.

Proposed Implementation

Minimum number of views and consensus score threshold gets set on the main SW workflow in the configuration property:
```
{
     min_number_views: 4,
     min_consensus_score: 3
}
```
Add a GraphQL client to SW (@marten suggested graphql-request or lokka.
Request is made with the subject and workflow IDs, as it is now.
Response consists of an array of objects, each representing a line. Those with number_views and consensus_score below those values set in the workflow configuration get chucked.
Of the remainder, the points are derived from clusters_x and clusters_y properties, as they are now.

Notes

There's no auth with Caesar to worry about.
By setting the thresholds on the workflow, we don't need to worry about redeploying code to modify them later.
The only functions that really need changing in aggregations.factory.js are _getAggregations and _formatAggregations.

Questions

What's the URL of the endpoint?

What does the query look like? I'm guessing something like:

`{
    Aggregation(workflow_id: 123, subject_id: 456) {
        frame0 {
            clusters_x
            clusters_y
            consensus_score
            number_views
        } 
    }
}`

Links

Aggregations factory in SW (yes, Angular calls things factories when they're not)
Example response data

Estimated time

I reckon this shouldn't take more than a day, so I'm saying two days.

cc @marten, @chrislintott

marten commented 7 years ago

The endpoint is https://caesar.zooniverse.org/graphql. (Or caesar-staging.zooniverse.org)

The contents of the aggregations vary wildly per reducer type, and I don't see how we could make those part of the GraphQL schema, sadly. So the query would be:

query Aggregation($workflowId: ID!, $subjectId: ID!) {
  workflow(id: $workflowId) {
    reductions(subjectId: $subjectId, reducerKey: "transcriptions"){
      data
    }
  }
}

(In GraphQL you can send query templates plus a JSON object that assigns variables to values, which is what I'm doing here. This means you don't have to do string concatenation to build queries, which is a common security hole.)

This would return the data you've linked as "example response data" under the "data" key. Something like (possibly I've pulled an outdated sample from staging):

https://gist.github.com/marten/20a26ddf7edfc5a25c54d8873625a9a6

marten commented 7 years ago

One other note, I'll need to set the workflow to have public reductions in Caesar before it works authless.

rogerhutchings commented 7 years ago

@marten, I've added the client, but I'm only getting empty arrays for the reductions. Is my query looking okay, or does the workflow still need public reductions enabling?

marten commented 7 years ago

I think that should work. Can you give me an example of a workflow ID and subject ID for which you don't get any data back?

marten commented 7 years ago

Oh, and note that we haven't run it over all old subjects yet, just subjects that have recently gotten classifications.

rogerhutchings commented 7 years ago

I'm looking at:

subjectId: "1278528"
workflowId: "205"

It may be that it hasn't been recently classified - have you got a subject ID with some reductions I can test against?

marten commented 7 years ago

Try 1276468.

zooniverse / shakespeares_world