zooniverse / shakespeares_world

Full text transcription project for the Folger Shakespeare Library
https://www.shakespearesworld.org
Other
8 stars 5 forks source link

Fetch aggregation data from Caesar #344

Closed rogerhutchings closed 6 years ago

rogerhutchings commented 7 years ago

This would fetch data for aggregated points generated by @CKrawczyk's new aggregation engine, delivered by Caesar. This replaces the existing aggregation data provided via the API aggregations endpoint.

Proposed Implementation

Notes

Questions

Links

Estimated time

I reckon this shouldn't take more than a day, so I'm saying two days.

cc @marten, @chrislintott

marten commented 7 years ago

The endpoint is https://caesar.zooniverse.org/graphql. (Or caesar-staging.zooniverse.org)

The contents of the aggregations vary wildly per reducer type, and I don't see how we could make those part of the GraphQL schema, sadly. So the query would be:

query Aggregation($workflowId: ID!, $subjectId: ID!) {
  workflow(id: $workflowId) {
    reductions(subjectId: $subjectId, reducerKey: "transcriptions"){
      data
    }
  }
}

(In GraphQL you can send query templates plus a JSON object that assigns variables to values, which is what I'm doing here. This means you don't have to do string concatenation to build queries, which is a common security hole.)

This would return the data you've linked as "example response data" under the "data" key. Something like (possibly I've pulled an outdated sample from staging):

https://gist.github.com/marten/20a26ddf7edfc5a25c54d8873625a9a6

marten commented 7 years ago

One other note, I'll need to set the workflow to have public reductions in Caesar before it works authless.

rogerhutchings commented 7 years ago

@marten, I've added the client, but I'm only getting empty arrays for the reductions. Is my query looking okay, or does the workflow still need public reductions enabling?

marten commented 7 years ago

I think that should work. Can you give me an example of a workflow ID and subject ID for which you don't get any data back?

marten commented 7 years ago

Oh, and note that we haven't run it over all old subjects yet, just subjects that have recently gotten classifications.

rogerhutchings commented 7 years ago

I'm looking at:

subjectId: "1278528"
workflowId: "205"

It may be that it hasn't been recently classified - have you got a subject ID with some reductions I can test against?

marten commented 7 years ago

Try 1276468.