ipfs-shipyard / ipfs-geoip

geoip lookup over DAG-CBOR dataset loaded from IPFS
MIT License
70 stars 21 forks source link

fix: update country names #76

Closed jessicaschilling closed 4 years ago

jessicaschilling commented 4 years ago

Summary

This PR updates country names for typographical consistency, simplicity (favoring shorter common names over full official ones), and political sensitivity. Full list of updates is below.

This PR addresses the need in https://github.com/ipfs-shipyard/ipfs-webui/issues/1603 , but that issue will need updating to the resulting ipfs-geoip release in order to close.

Done

To do

Updates made

Antigua And Barbuda -> Antigua and Barbuda Bosnia & Herzegovina -> Bosnia and Herzegovina "Bolivia, Plurinational State Of" -> Bolivia "Bonaire, Saint Eustatius And Saba" -> "Bonaire, Saint Eustatius and Saba" Democratic Republic Of Congo -> Democratic Republic of Congo Republic Of Congo -> Republic of Congo Czech Republic -> Czechia "Ceuta, Mulilla" -> Ceuta and Mulilla "Micronesia, Federated States Of" -> Micronesia "France, Metropolitan" -> Metropolitan France South Georgia And The South Sandwich Islands -> South Georgia and the South Sandwich Islands Guinea-bissau -> Guinea-Bissau Heard Island And McDonald Islands -> Heard Island and McDonald Islands Isle Of Man -> Isle of Man "Iran, Islamic Republic Of" -> Iran Saint Kitts And Nevis -> Saint Kitts and Nevis "Korea, Democratic People's Republic Of" -> DPR Korea "Korea, Republic Of" -> Republic of Korea Lao People's Democratic Republic -> Laos "Macedonia, The Former Yugoslav Republic Of" -> North Macedonia Saint Pierre And Miquelon -> Saint Pierre and Miquelon "Palestinian Territory, Occupied" -> Palestine "Saint Helena, Ascension And Tristan Da Cunha" -> "Saint Helena, Ascension and Tristan da Cunha" Svalbard And Jan Mayen -> Svalbard and Jan Mayen Syrian Arab Republic -> Syria Turks And Caicos Islands -> Turks and Caicos Islands Tristan de Cunha -> Tristan da Cunha Trinidad And Tobago -> Trinidad and Tobago "Taiwan, Province Of China" -> Taiwan "Tanzania, United Republic Of" -> Tanzania United States -> USA Vatican City State -> Vatican City Saint Vincent And The Grenadines -> Saint Vincent and the Grenadines "Venezuela, Bolivarian Republic Of" -> Venezuela Virgin Islands (British) -> British Virgin Islands Virgin Islands (US) -> US Virgin Islands Viet Nam -> Vietnam Wallis And Futuna -> Wallis and Futuna

jessicaschilling commented 4 years ago

@hacdias - are you able to help me regenerate this? Thank you!

cc @lidel for continuity.

lidel commented 4 years ago

@jessicaschilling the error looked familiar, so I fixed it, but side effects turned out to be yet another :hole: :rabbit2:

Changes:

I've started npm run generate but it takes long time on my machine (dying disk?), so was unable to confirm if output is correct. EOD for me, but I've left it running, will get back to this when done tomorrow/monday (or you can try running it locally).

ps. CI fails because commit titles in this repo need to follow type: title convention, and this PR has one commit which breaks that (we should rebase and fix the title at some point)

ps2. I worry those manual changes are not future proof, and will be pain to re-apply on updates. Perhaps we could have a normalization function normalizedName that modifies known problematic country names on the fly? I think

-acc[row.alpha2] = row.name
+acc[row.alpha2] = normalizedName(row.name)

would do the trick here: https://github.com/ipfs-shipyard/ipfs-geoip/blob/d8cd0f8a31ec7b22d454ac6094231b47327e7a6e/src/generate/index.js#L39

This way we would not have to touch the source data.

lidel commented 4 years ago
> bin/generate

Finished with root hash QmVp6myP4AzFAPuP7rWS7oa4kPu76BP9dd7ahvtyEyfEnM

@jessicaschilling try fetching my fixes and running npm run generate – lets see if you get the same CID (you need IPFS node running while you do it)

jessicaschilling commented 4 years ago

Unfortunately npm run generate fails on Error: Cannot find module 'it-concat', and npm install then Failed at the bcrypto@5.1.0 install script when trying to install it-concat. Can't install brcypto directly, either. I'm afraid I'm a bit stuck.

lidel commented 4 years ago

Tested but does not pass. Seems that a lot of stuff changed since geoip was created, APIs changed all over the place breaking random parts. Needs additional investigation. Will block some time today to figure out if it can be fixed, or if its better to rewrite entire thing as part of #63

hacdias commented 4 years ago

Thanks for stepping into this @lidel!

lidel commented 4 years ago

Quick update:

jessicaschilling commented 4 years ago

Thanks so much for the detour for this. LMK if there's any way I can be more useful.

jessicaschilling commented 4 years ago

@lidel Thanks so much for all this unexpected work!

The overrides look perfect, with one question: Some original items included double quotes to escape out commas in country names. In the overrides, some remove the comma and thus the need for double quotes (e.g. "Bolivia, Plurinational State Of" -> Bolivia) but some don't (e.g. "Bonaire, Saint Eustatius And Saba" -> "Bonaire, Saint Eustatius and Saba"). I believe your overrides.js may leave us in the situations of double quotes when we don't really need them -- do we care?

I don't have the permissions to cut a new release or update to npm, but if you do, I can take care of the remaining steps in https://github.com/ipfs-shipyard/ipfs-webui/issues/1603. Let me know if there's anything else I can do. 🙏

lidel commented 4 years ago

The double quotes story is pretty fun, and probably there is a commentary on the state of our civilization somewhere in there:

The original double quotes were part of .csv, source file format which uses comma as a field separator. Double quotes around strings are necessary in names which include comma (,), to keep those additional commas from being interpreted as field separators. Overrides execute after data is read from CSV, and at that stage of processing the double quotes around the name are already gone. So no issue here.

You may notice that some names in overrides.js have single quotes, and some use double ones, and those are different ones than in .csv. This is for similar-but-different reason: in JS we use single quotes by default, but when a string includes ' (and some names do), then use of double ones makes it work and is easier on the eyes than escaping '.


Anyway, I'll merge this and release later today, when done will comment on the ipfs-webui issue.