Closed rogerhutchings closed 6 years ago
The endpoint is https://caesar.zooniverse.org/graphql
. (Or caesar-staging.zooniverse.org)
The contents of the aggregations vary wildly per reducer type, and I don't see how we could make those part of the GraphQL schema, sadly. So the query would be:
query Aggregation($workflowId: ID!, $subjectId: ID!) {
workflow(id: $workflowId) {
reductions(subjectId: $subjectId, reducerKey: "transcriptions"){
data
}
}
}
(In GraphQL you can send query templates plus a JSON object that assigns variables to values, which is what I'm doing here. This means you don't have to do string concatenation to build queries, which is a common security hole.)
This would return the data you've linked as "example response data" under the "data" key. Something like (possibly I've pulled an outdated sample from staging):
https://gist.github.com/marten/20a26ddf7edfc5a25c54d8873625a9a6
One other note, I'll need to set the workflow to have public reductions in Caesar before it works authless.
@marten, I've added the client, but I'm only getting empty arrays for the reductions. Is my query looking okay, or does the workflow still need public reductions enabling?
I think that should work. Can you give me an example of a workflow ID and subject ID for which you don't get any data back?
Oh, and note that we haven't run it over all old subjects yet, just subjects that have recently gotten classifications.
I'm looking at:
subjectId: "1278528"
workflowId: "205"
It may be that it hasn't been recently classified - have you got a subject ID with some reductions I can test against?
Try 1276468.
This would fetch data for aggregated points generated by @CKrawczyk's new aggregation engine, delivered by Caesar. This replaces the existing aggregation data provided via the API aggregations endpoint.
Proposed Implementation
Minimum number of views and consensus score threshold gets set on the main SW workflow in the
configuration
property:Add a GraphQL client to SW (@marten suggested graphql-request or lokka.
Request is made with the subject and workflow IDs, as it is now.
Response consists of an array of objects, each representing a line. Those with
number_views
andconsensus_score
below those values set in the workflow configuration get chucked.Of the remainder, the points are derived from
clusters_x
andclusters_y
properties, as they are now.Notes
_getAggregations
and_formatAggregations
.Questions
What does the query look like? I'm guessing something like:
Links
Estimated time
I reckon this shouldn't take more than a day, so I'm saying two days.
cc @marten, @chrislintott