semio / ddf--gapminder--gapminder_world

0 stars 9 forks source link

fsm double in UNPOP indicators #12

Open jheeffer opened 7 years ago

jheeffer commented 7 years ago
  1. Micronesia is a geographic region (like southern Europe)
  2. Federated States of Micronesia is a country in Micronesia, among other countries (like Italy is in Southern Europe)
  3. Gapminder World google spreadsheets version uses full country names
  4. Gapminder World ddfcsv version uses entity ids
  5. To translate country names to entity ids we use this file https://github.com/Gapminder/waffle-server-importers-exporters/blob/world-legacy-with-data/data/synonym/country_synonyms.xlsx
  6. The file contains Micronesia as a synonym for Federated States of Micronesia and does not contain an entry for Micronesia itself.
  7. Indicators of UNPOP have data for both Micronesia and Federated States of Micronesia
  8. Translating these indicators' names to entity id's gives two datapoints for Federated States of Micronesia, because of the automatic translation translating Micronesia stats to Federated States of Micronesia.

How to solve:

  1. Remove Micronesia as a synonym for Federated States of Micronesia and add an entity id for Micronesia Region (and other regions in dataset which are not in geo-entities now).
  2. Change name of Micronesia to something less ambiguous (Micronesia (Region)) in source

I'd go for 1.

semio commented 7 years ago

agreed. Let's do 1

jheeffer commented 7 years ago

Though like #13 no high priority since overwrite from new ddf data (unpop) in SG limits this problem to gapminder_world dataset.