zooniverse / front-end-monorepo

A rebuild of the front-end for zooniverse.org
https://www.zooniverse.org
Apache License 2.0
104 stars 30 forks source link

Engaging Crowds: duplicate pages being served for HMS NHS #2818

Closed eatyourgreens closed 2 years ago

eatyourgreens commented 2 years ago

Package

lib-classifier

Describe the bug

Volunteers are reporting, on Talk, that when they reach the end of a batch of 10 subjects, the last 3 subjects are used again. https://www.zooniverse.org/projects/msalmon/hms-nhs-the-nautical-health-service/talk/4936/2331577?page=1&scrollToLastComment=true

Expected behavior

New subjects added to the queue shouldn't include duplicates.

Additional context

We request new subjects from Designator either 2 or 3 subjects before the end of the queue, so that you can classify while new subjects load in the background. When subjects are shown in order, this means the first 2 or 3 unclassified subjects from Designator could be duplicates of the last 2 or 3 subjects in your existing queue.

mrniaboc commented 2 years ago

Reopening this issue as we are still getting reports of this bug on HMS NHS - https://www.zooniverse.org/projects/msalmon/hms-nhs-the-nautical-health-service/talk/4936/2331577?comment=3862908

eatyourgreens commented 2 years ago

@mrniaboc there's an open issue on Panoptes which might be causing problems on prioritised workflows: https://github.com/zooniverse/panoptes/issues/3789

eatyourgreens commented 2 years ago

With #2819, the classifier ignores any incoming subject that is already in the queue. That would imply that any duplicates that volunteers are seeing now are being served by Designator ie. the classifier is receiving seen subjects from Panoptes, and showing them because they aren't already in the local queue.

We could filter Already Seen subjects out of the classifier completely, but then that would break Scarlets & Blues and the RBGE Herbarium project, where you're allowed to pick any subject, regardless of seen status.

mrniaboc commented 2 years ago

OK, thanks Jim. Time to call the backend team...

eatyourgreens commented 2 years ago

I guess one additional optimisation we can make for prioritised workflows, in the browser, is to strip the API response of any subject whose priority is lower than the highest priority in the current queue.

eatyourgreens commented 2 years ago

Having one queue model that handles the different logic for randomised queued subjects (PH-TESS, Beyond Borders etc.), prioritised queued subjects (Davy Notebooks, HMS NHS) and indexed subjects (Scarlets & Blues, RBGE) is starting to feel a little unwieldy to me.

eatyourgreens commented 2 years ago

2888 adds a filter that checks subject priority before appending new subjects to the queue.

eatyourgreens commented 2 years ago

This bug seems to trigger when the response from Designator contains subjects before the first subject in the local queue.

So, if upcoming subject priorities in the browser look like this: [11, 12, 13], and the response from Designator looks like this: [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], then the browser queue looks like this after removing duplicates and appending new subjects: [11, 12, 13, 10, 14, 15, 16, 17, 18, 19].

The volunteer classifies pages 11, 12 and 13 then finds themselves back at page 10 and unable to proceed forward.

eatyourgreens commented 2 years ago

One more detail that should be recorded here: when workflow.prioritized is true but a volunteer has classified every available subject, the subject selector switches back from ordered to randomised subjects. I'm pretty sure we only see this bug when the incoming queue is in priority order.

Code that works with prioritised queues should check for workflow.prioritized and also subject.user_has_finished_workflow on each incoming subject.

eatyourgreens commented 2 years ago

This has come up again, also happening every 10 subjects or so. https://www.zooniverse.org/projects/msalmon/hms-nhs-the-nautical-health-service/talk/subjects/44585701

I think 10 subjects is the length of the local queue that we store in the browser. #2888 should filter any incoming subjects that are already in the queue, or which have a lower priority than the end of the local queue.

lcjohnso commented 2 years ago

@eatyourgreens Is this ready to close or still open?

eatyourgreens commented 2 years ago

I think we’ve done everything we can to stop it from happening.