inaturalist / iNaturalistAPI

Node.js API for iNaturalist.org
https://api.inaturalist.org/
102 stars 31 forks source link

Set default preferred_place based on locale strings with localities #89

Open pleary opened 7 years ago

pleary commented 7 years ago

The local param can include a locality code in addition to the language code (e.g. en-NZ or es-MX). We could lookup the iNaturalist place equivalents to the location portion of the code and use that as a default preferred_place, (e.g. en-NZ sets a default preferred_place_id of 6803, New Zealand's iNat place_id).

I suggest the order of precedence of preferred_place from least to most important:

kueda commented 4 years ago

So I guess the idea here would be to modify InaturalistAPI.lookupPreferredPlaceMiddleware to do something like look up places in the database by their admin_level and code attributes.

viatrix commented 4 years ago

I started investigating this issue and I have a few concerns:

  1. code field is absent in elasticsearch indexes for places;
  2. in test db code field is filled only for US states and is empty for countries.


Is code field used for countries on non-test environments? Which approach is better for searching by code: by adding it to elasticsearch indexes or by selecting from the db?

kueda commented 4 years ago


Is code field used for countries on non-test environments?

Generally yes, especially for countries. You can see this using the old Rails-based JSON endpoints, e.g. https://www.inaturalist.org/places/russia.json, where the code field is set to RU.

Which approach is better for searching by code: by adding it to elasticsearch indexes or by selecting from the db?

IMO, since this is a pretty quick lookup and we're not planning on using it for search, I would fetch it out of the database. If that becomes a performance problem, we could add it to elasticsearch later. @pleary do you have an opinion on this?

viatrix commented 4 years ago

I’ve noticed that ancestry field in places table looks inconsistent: in Node.js test seeds for postgres (fixtures.js) it includes id of the current item:

{
  "id": 222,
  "name": "California",
  "ancestry": "111/222"
},

In test db and in rails code it doesn’t contain id of the current record, only the id of the parent record; id of the current record is pushed to ancestor_place_ids during processing:

 id  |    name    | ancestry 
-----+------------+----------
 297 | California | 17

Should I consider that ancestry field contains only parent ids, without current id?

kueda commented 4 years ago

Weird, that's probably a problem with the fixture, so yes, assume the ancestry field only contains ancestor IDs, not the ID of the record itself.

kueda commented 3 years ago

One unexpected consequence of this that we need to figure out is that due to the fact that we are prioritizing names in a place over names matching a locale without a place, people requesting names in en-HK are getting Chinese names in Hong Kong when an English name exists but lacks a place association. For example, when you request https://api.inaturalist.org/v1/taxa/627207?locale=en-HK, the preferred_common_name is "大頭茶." I'm going to temporarily disable this until we figure that out. IMO, the right solution is to change the way we prioritize the names, but I think that's going to have some other unexpected and maybe more-widespread consequences, which we should probably just deal with... when we have the bandwidth. Alternatively, we could give lower weight to the places we extract from the locale code.

Some backstory regarding our current name priority is at https://groups.google.com/u/1/g/inaturalist/c/P8iNMY0WYNM/discussion