inaturalist / iNaturalistAndroid

Android app for iNaturalist.org
https://market.android.com/details?id=org.inaturalist.android
MIT License
168 stars 57 forks source link

Observers list in Explore - some people are listed twice #1313

Open tiwane opened 1 year ago

tiwane commented 1 year ago

(version 1.29.18 - 592; user id - tiwane; Android API = 33)

  1. Go to Explore and search for fungi in Australia
  2. Tap on Observers and scroll down
  3. Some people are listed twice, eg

image

In this screenshot and in the one sent to me by a user, annbentley was listed twice and in the same rank. So it seems consistent across devices.

budowski commented 1 year ago

Seems like an issue with the API - for example: Retrieving page number 5 of the observers list for Australia+Fungi: https://api.inaturalist.org/v1/observations/observers?page=5&per_page=30&taxon_id=47170&place_id=6744&quality_grade=needs_id,research&order_by=created_at&order=desc

Same call, just with page number 6: https://api.inaturalist.org/v1/observations/observers?page=6&per_page=30&taxon_id=47170&place_id=6744&quality_grade=needs_id,research&order_by=created_at&order=desc

Notice that in both responses, for example - the user named "wazzza" appears (with obseration_count = 315).

@pleary - what do you think?

pleary commented 1 year ago

Observer counts will unfortunately always be approximate, and may vary from request to request (see this section and the following sections for more information from the Elasticsearch documentation).

We can set some parameters like shard_size to help counts be more accurate. I've updated the shard_size we use when the requests is for results up to 500 deep in the result set, and that appears to have helped this case - userwazzza is no longer appearing in that second URL. When making multiple requests for pagination, there is unfortunately always a chance that some results might get repeated in different result sets. One way to avoid repeats is to request the maximum number of observers (500) in a single request. Another would be do code around it on the client side - suppressing potential duplicates in later pages.