outbreak-info / outbreak.info

During outbreaks of emerging diseases such as COVID-19, efficiently collecting, sharing, and integrating data is critical to scientific research. outbreak.info is a resource to aggregate all this information into a single location.
https://outbreak.info/
GNU General Public License v3.0
33 stars 13 forks source link

GENOMIC DATA: Make sure Admin 0 / country names are normalized to Natural Earth names #242

Closed flaneuse closed 3 years ago

flaneuse commented 3 years ago

Natural Earth names: https://github.com/outbreak-info/outbreak.info/blob/master/web/src/assets/geo/countries.json

Related to #236 but for countries. Countries will use Natural Earth boundaries and Divisions GADM (I think?) since they match better to the names already in GISAID.

flaneuse commented 3 years ago

Known issues:

Dominican Republic != Dominican Rep.

AlaaALatif commented 3 years ago

Location normalization should now capture all country-level information submitted to GISAID except for the following: {'Caribbean', 'Crimea'}

These were not normalized to a GADM country name. In total, this accounts for 19 samples submitted to GISAID as of 2021-02-16.

An updated notebook with location normalization can be found here

Going to close this after complete deployment and sanity checks

flaneuse commented 3 years ago

Closes with #272