zooniverse / zoo-notes

A web app that lets users view extracts and reductions for Zooniverse Subjects, intended for use in classrooms.
https://zoo-notes.zooniverse.org/
1 stars 0 forks source link

Aggregations Viewer mixes up data for multi-Task workflows #78

Closed shaunanoordin closed 2 years ago

shaunanoordin commented 2 years ago

Functionality Issue

⚠️ Warning: issue is incredibly hard to spot unless you're aware of the expected data in advance. Scope: we're only worried about workflows with multiple Single Answer Question Tasks - specifically the Galaxy Zoo for Schools project, workflow ID 40)

If a Workflow has multiple Tasks, then the Aggregations Viewer may not show the correct aggregations for the selected Task.

Problem is best illustrated with an example:

image

This indicates that the workflow's index IDs and the aggregation's index IDs aren't synced to the same tasks.

The problem could be any of the following:

Dev Notes

Relevant Resources

🔑 indicate URLs that require Zooniverse logins

Baseline Success States (i.e. examples of what's working)

The Aggregations Viewer works fine if a Workflow has only one Task. For example:

image

Status

This is a major issue preventing educational classrooms from using a large swathe of workflow types. I'm aware that Kat wishes to use Galaxy Zoo for Classrooms with Zoo Notes some time in mid-May, so a fix is ideal, though this is not at a do-or-die priority. (The fallback is to use Galaxy Zoo for Classrooms without Zoo Notes)

shaunanoordin commented 2 years ago

Tagging in @eatyourgreens for dev work, and @camallen for any graphql advice. I'll add additional dev notes on the code on the Aggregations Viewer in a minute or ten. Processing, processing...

shaunanoordin commented 2 years ago

Additional Dev Notes

Here were my thought going into this issue:

Here are some files of interest:

Thoughts:

eatyourgreens commented 2 years ago

I don't know if this is relevant here, but the query that we use to get reductions for transcription tasks is slightly different. There's an extra reducerKey parameter. https://github.com/zooniverse/front-end-monorepo/blob/8fbe0d6a5fda63ba28e1c0d400b702aa19019b84/packages/lib-classifier/src/store/SubjectStore/Subject/TranscriptionReductions/TranscriptionReductions.js#L92-L98

const query = `{
            workflow(id: ${workflowId}) {
              subject_reductions(subjectId: ${subjectId}, reducerKey:"${REDUCER_KEY}")
              {
                data
              }
            }
          }`
lcjohnso commented 2 years ago

In agreement with @eatyourgreens above, I think the standard assumption has been that anyone pulling extracts or reductions from Caesar will be requesting data from specific extractors or reducers by key -- example: the Caesar config for transcription workflows all require a specific key=alice for extractor and reducer, and that is used to identify the data of interest (via workflowId + subjectId AND reducerKey selection).

There is no standard convention for extractor and reducer key names -- these are totally up to the discretion of the project team / configurer. The case of GZ, where each task has its own extractor and reducer with a key in form T (e.g., T0), is useful here but should not be assumed to be used by any project generally.

eatyourgreens commented 2 years ago

Thanks for taking a look @lcjohnso. That's really useful.

To get GZ for Classrooms working, I've got no problem with setting up queries that use specific keys just for that project. @shaunanoordin @camallen do you have any thoughts?

eatyourgreens commented 2 years ago

I think the non-breaking solution would be to add the reducer key as a URL parameter eg. https://zoo-notes.zooniverse.org/view/workflow/40/subject/475202?reducer=T4.

When the reducer parameter is present, we query just for that key, which should give us back results for only one task.

EDIT: we probably want to pass a key with the extracts query too.

shaunanoordin commented 2 years ago

(Copy-pasting a response I wrote in Slack:)

Instead of asking the app to accept ?reducer=T1&extractor=T1, I think a better solution is to standardise the Caesar Extraction & Reduction rules, so we manually enforce the "reduction/extractions keys MUST match Task keys" at the Caesar config level.

This will place responsibility on the dev/techs to ensure uniformity, instead of asking the educators to learn which extractor/reducer keys apply to which workflow.

This one is on me - when I set up the Virus Picker and Virus Classifier reducers/extractors, I should have made it a point to use the matching WF Task keys instead of blindly copying a template. When I set up the reducers/extractors for Galaxy Zoo, I made it a point to explicitly match reducer/extractor keys with the task keys, and I think it works much better (see PR #89)