iDigBio / idigbio-search-api

Server-side code driving iDigBio's search functionality.
GNU General Public License v3.0
24 stars 5 forks source link

Country Code "RU" #20

Open kevinlove opened 8 years ago

kevinlove commented 8 years ago

I'm not getting back any summary results for records that contain the country code "RU" when using the field "countrycode".

The recordset of interest: https://www.idigbio.org/portal/recordsets/b3d38693-5f9b-484d-aca3-6cefcc5b08e0

Has 1,850 records from the country "russia"

Summary Endpoints

Recordset: http://search.idigbio.org/v2/summary/top/records/?rq={%22recordset%22:%22b3d38693-5f9b-484d-aca3-6cefcc5b08e0%22}&top_fields=[%22countrycode%22]&count=100

Specimen record: http://search.idigbio.org/v2/summary/top/records/?rq={%22uuid%22:[%223f9f0c49-7b30-42f1-a42c-7f7721f4e88d%22]}&top_fields=[%22countrycode%22]

danstoner commented 8 years ago

Here is individual example of a record from that recordset that contains "dwc:country": "Russia" and also provides "dwc:countryCode": "RU".

https://www.idigbio.org/portal/records/3f9f0c49-7b30-42f1-a42c-7f7721f4e88d

We don't currently have a match for "russia" in the locality dictionary (in idb-backend/idb/data_tables/locality_data.py).

Despite the fact that "russia" could include other modern countries due to geopolitical changes, we probably ought to map "russia" to "russian federation".

https://en.wikipedia.org/wiki/ISO_3166-1

For some reason the RU countryCode is not getting indexed properly for this record (there is no index term "countrycode" on this record / these records).