m-lab / etl-gardener

Gardener provides services for maintaining and reprocessing mlab data.
Apache License 2.0
13 stars 5 forks source link

Normalize maxmind geo1 and geo2 "region" data #281

Open stephen-soltesz opened 4 years ago

stephen-soltesz commented 4 years ago

Not sure the best place to put this issue.

The maxmind geo1 and geo2 information will need to be reconciled. Maxmind geo1 formats provide the "region" field explicitly, whereas geo2 formats provide "subdivision" 1 & 2 iso codes and names (though, not all locations have two), without a "region". Typically subdivision1 == region, however anecdotally, I've seen this not to be the case. For example, in geo1 an Australian region was a number or abbreviation (e.g. WA), whereas in geo2 it was the name of the state (e.g. "Western Australia").

critzo commented 4 years ago

We used to annotate region with the FIPS 10-4 codes https://en.wikipedia.org/wiki/FIPS_10-4 There is not a clean mapping of FIPS 10-4 to ISO 3166. I would recommend that we add ISO 3166 sub 1 and sub 2 codes, and re-process all data to ensure consistency.