Closed shaunanoordin closed 2 years ago
Tagging in @eatyourgreens for dev work, and @camallen for any graphql advice. I'll add additional dev notes on the code on the Aggregations Viewer in a minute or ten. Processing, processing...
Here were my thought going into this issue:
Here are some files of interest:
fetchAggregations()
const query = gql`{
workflow(id: ${workflowId}) {
reductions(subjectId: ${subjectId}) {
data
}
extracts(subjectId: ${subjectId}) {
data
}
}
}`
const reductionsData = reductions && reductions[selectedTaskIndex]?.data
Thoughts:
I don't know if this is relevant here, but the query that we use to get reductions for transcription tasks is slightly different. There's an extra reducerKey
parameter.
https://github.com/zooniverse/front-end-monorepo/blob/8fbe0d6a5fda63ba28e1c0d400b702aa19019b84/packages/lib-classifier/src/store/SubjectStore/Subject/TranscriptionReductions/TranscriptionReductions.js#L92-L98
const query = `{
workflow(id: ${workflowId}) {
subject_reductions(subjectId: ${subjectId}, reducerKey:"${REDUCER_KEY}")
{
data
}
}
}`
In agreement with @eatyourgreens above, I think the standard assumption has been that anyone pulling extracts or reductions from Caesar will be requesting data from specific extractors or reducers by key -- example: the Caesar config for transcription workflows all require a specific key=alice
for extractor and reducer, and that is used to identify the data of interest (via workflowId
+ subjectId
AND reducerKey
selection).
There is no standard convention for extractor and reducer key names -- these are totally up to the discretion of the project team / configurer. The case of GZ, where each task has its own extractor and reducer with a key in form TT0
), is useful here but should not be assumed to be used by any project generally.
Thanks for taking a look @lcjohnso. That's really useful.
To get GZ for Classrooms working, I've got no problem with setting up queries that use specific keys just for that project. @shaunanoordin @camallen do you have any thoughts?
I think the non-breaking solution would be to add the reducer key as a URL parameter eg. https://zoo-notes.zooniverse.org/view/workflow/40/subject/475202?reducer=T4.
When the reducer
parameter is present, we query just for that key, which should give us back results for only one task.
EDIT: we probably want to pass a key with the extracts
query too.
(Copy-pasting a response I wrote in Slack:)
Instead of asking the app to accept ?reducer=T1&extractor=T1, I think a better solution is to standardise the Caesar Extraction & Reduction rules, so we manually enforce the "reduction/extractions keys MUST match Task keys" at the Caesar config level.
This will place responsibility on the dev/techs to ensure uniformity, instead of asking the educators to learn which extractor/reducer keys apply to which workflow.
This one is on me - when I set up the Virus Picker and Virus Classifier reducers/extractors, I should have made it a point to use the matching WF Task keys instead of blindly copying a template. When I set up the reducers/extractors for Galaxy Zoo, I made it a point to explicitly match reducer/extractor keys with the task keys, and I think it works much better (see PR #89)
Functionality Issue
⚠️ Warning: issue is incredibly hard to spot unless you're aware of the expected data in advance. Scope: we're only worried about workflows with multiple Single Answer Question Tasks - specifically the Galaxy Zoo for Schools project, workflow ID 40)
If a Workflow has multiple Tasks, then the Aggregations Viewer may not show the correct aggregations for the selected Task.
Problem is best illustrated with an example:
aggregations.data.workflow.reductions[index 2]
, there are six aggregated answers:{0: 1, 1: 1, 2: 2, 3: 5, 4: 10, 5: 4}
This indicates that the workflow's index IDs and the aggregation's index IDs aren't synced to the same tasks.
The problem could be any of the following:
Dev Notes
Relevant Resources
🔑 indicate URLs that require Zooniverse logins
Baseline Success States (i.e. examples of what's working)
The Aggregations Viewer works fine if a Workflow has only one Task. For example:
https://caesar.zooniverse.org/graphql
with the very simple query{ workflow(id: 17096) { reductions(subjectId: 53160411) { data } extracts(subjectId: 53160411) { data } } }
(Get all extracts & reductions for WF 17096, subject 53160411)aggregations.data.workflow.reductions[]
array has only one index entry, so there's no possible confusion as to which aggregation data maps to which Task.Status
This is a major issue preventing educational classrooms from using a large swathe of workflow types. I'm aware that Kat wishes to use Galaxy Zoo for Classrooms with Zoo Notes some time in mid-May, so a fix is ideal, though this is not at a do-or-die priority. (The fallback is to use Galaxy Zoo for Classrooms without Zoo Notes)