zooniverse / panoptes

Zooniverse API to support user defined volunteer research projects
Apache License 2.0
103 stars 41 forks source link

subject_set_id filter doesn't retrieve all the subjects for a subject set #4274

Closed eatyourgreens closed 4 months ago

eatyourgreens commented 10 months ago

I'm downloading the subjects for a given subject set by using the subject_set_id filter on the /subjects endpoint:

and so on. The subject set has 1006 subjects. https://www.zooniverse.org/lab/7929/subject-sets/117755

There are 1,006 subjects in the paged results, but ~100 are duplicates (by ID), so my code only retrieves ~900 unique subjects. It looks like subject IDs are unique within a single page of results, but not unique from page to page.

eatyourgreens commented 10 months ago

The subject set search API uses the same query, so it will run into this bug for sets with more than one page of subjects.

https://github.com/zooniverse/subject-set-search-api/blob/967e78a6832a0f8377f911f657a2e6843639c804/src/subject-set.js#L21-L37

yuenmichelle1 commented 4 months ago

We have seen this issue with workflows noted by our mobile dev. The simplest fix was to add a sort by id. See: https://github.com/zooniverse/panoptes/pull/4327/files

I don't know subject-set-search-api well enough to know if it is ok to have subjects pulled sorted. @eatyourgreens Wanted to check if it is ok to have subjects pulled be sorted (in this case sorted by subject id)? Otherwise the underlying fix is going to be within our panoptes-api-version of restpack-serializer.

cc @lcjohnso

eatyourgreens commented 4 months ago

Sounds good to me. I'll open a PR.

yuenmichelle1 commented 4 months ago

I should note that sort does not work just yet. @Tooyosi will implement the fix on panoptes side (update the subject serializer to allow sorting by id). Then this hopefully should be in a good state.

eatyourgreens commented 4 months ago

It would be great if ID was the default sort, so that API consumers don't have to manually specify it.

yuenmichelle1 commented 4 months ago

It would be great if ID was the default sort, so that API consumers don't have to manually specify it.

Tooyosi is checking with cliff to see if there are any reasons to not set a default sort by id for subjects at least. But I can’t think of any. So I’m optimistic we can implement

lcjohnso commented 4 months ago

Closed by #4359 -- we added ID sort by default for subjects in order to provide better consistency over paginated results.