zooniverse / front-end-monorepo

A rebuild of the front-end for zooniverse.org
https://www.zooniverse.org
Apache License 2.0
105 stars 30 forks source link

Errors in project classification counts #6329

Open eatyourgreens opened 1 week ago

eatyourgreens commented 1 week ago

Describe the bug

The new user classification counts, published last night, don’t agree with the counts that were published on the old home page. Some volunteers are reporting differences of thousands of classifications, in total, for their new user stats.

Here’s a couple of examples from my account.

New user stats page stats (all time):

Old home page stats:

To Reproduce

Logged in as a Zooniverse volunteer, go to More Stats and select All Time for the time range. Project classification counts are shown under Top Projects. Compare the new counts with the counts shown on https://pr-7177.pfe-preview.zooniverse.org/#projects. Differences show up for both Ouroboros and Panoptes projects, with older projects (pre-2019 maybe) more likely to show large differences from the activity_count stored on your project preferences.

Expected behavior

Classification history shouldn't have changed during the move to a new stats API. Total classifications for a given project should match user_project_preferences.preferences.activity_count for that project.

Additional context

This seems like a problem that should have been caught by spot checking some volunteer accounts prior to launch, or by snapshot testing against the old counts. Generate project classification snapshots for a few thousand sample accounts on the old API, generate the same snapshots with the new API then assert that the snapshots match.

FWIW I use this technique in one of my current software projects to check that backend changes can be deployed to production without breaking research models in the production database. Before a release, the release branch is tested against snapshots generated on the main branch, using real data. It's a very useful technique.

eatyourgreens commented 1 week ago

The old home page uses preferences.activity_count, which ignores any classification where either:

If the ERAS count includes duplicates and retired subjects, that might explain the sudden increase in classification counts for PFE projects. https://www.zooniverse.org/talk/2354/3435274?comment=5657245&page=7

eatyourgreens commented 1 week ago

The new code uses preference.activity_count here: https://github.com/zooniverse/front-end-monorepo/blob/a4fba3d2143940998b746356961fa7243b419932/packages/lib-user/src/components/UserHome/components/RecentProjects/RecentProjects.js#L54-L61

but ERAS here: https://github.com/zooniverse/front-end-monorepo/blob/a4fba3d2143940998b746356961fa7243b419932/packages/lib-user/src/components/shared/TopProjects/TopProjects.js#L115-L122

That leads to inconsistent and confusing UX in the new code, where the same project can show two different numbers for the same volunteer.

eatyourgreens commented 1 week ago

After chatting with @yshish and others, another difference we're seeing is that ERAS reports a different number of all projects worked on.

For my account:

For @yshish:

The problem with having launched a new home page, and also changed how projects and classifications are counted, is that people are confused by the change in numbers, and consequently unsure whether they can trust the new stats.