Closed joeroe closed 2 years ago
Thank you very much for pointing out these issues!
fix_database_country_name()
uses two methods to determine "correct" country names:
countrycode::codelist[["country.name.en"]]
The following issues were wrong entries in the thesaurus and just fixed in b31ef977a28def0dd55b49246fa17bae8cb79001:
Corsica Crete Czechia/Czech Republic Lybia (Libya?) Tunesia (Tunisia?)
The other mentioned entities (Channel Islands, Sardinia, Yugoslavia) do exist in countrycode::codelist[["country.name.en"]]
and as this is our major point of reference I think it's better to keep them as they are.
Yugoslavia is a clear case, because this former country covers multiple modern countries and it would not be clear to which one the respective date should be attributed. I think for the purpose of this function we should keep sunken empires alive :smile:. I'm less sure about Sardinia and Channel Islands, though.
Thanks! Definitely agree with leaving Yugoslavia. The Channel Islands are weird, and Sardinia has been part of Italy for a long time...
I spotted these when trying to join the full database to Natural Earth (which uses ISO/FIPS names) based on country names. Had I read the fix_database_country_name()
documentation properly, I would have spotted that I could just have standardised them with countrycode::countryname()
, so yes consistency is definitely the winner!
Hm - we could omit some rows of countrycode::codelist
for our lookup to get rid of Sardinia and other less useful entries. What do you think, @dirkseidensticker ?
I had a shower thought regarding this issue: We could very well get rid of countrycode::codelist
and instead extent our own country thesaurus. We would probably not loose much, but gain full control over the country list. Would also make our code more simple.
We decided to indeed implement this suggested change. @yesdavid offered to implement a first draft :+1:
This is not relevant any more, as we deprecated fix_database_country_name()
in v.3.0 without a replacement.
I noticed that fix_database_country_name() currently returns some country names that aren't actually (or currently) countries:
This might not be a bug exactly—these labels do contain useful geographic information—but I wouldn't say it's expected behaviour since the name of the function implies a sovereign country will be returned. So if they remain in the thesaurus, maybe that should be documented somewhere?
I also spotted some inconsistencies in names for the same country:
And some mistakes in the corrected names: