If you look closely, you can see that there is no overlap of numbers > 0.
Somewhere between 2003 and 2007 the emdat.be must've changed names.
Fill out Timor-Leste here and you see that combined numbers are correct: http://www.emdat.be/country_profile/index.html
Both names are translated to tls when turning Gapminder World to DDFcsv. Therefore, there is tls datapoints for both East Timor and Timor-Leste and thus duplicate keys.
How to solve:
1) Update source, combining East Timor and Timor-Leste data to one row
2) Make script smart so it merges the two
3) Keep script dumb and make an exact copy of data as it is: Make sure there are two separate entity ids for East Timor and Timor-Leste. Though this keeps the error in the dataset (not sound).
This has no priority as ddf--cred--em_dat should overwrite this data correctly in SG.
This would be purely to make this historic dataset valid and sound.
It seems East Timor has been renamed over the years from East Timor to Timor-Leste in the http://www.emdat.be dataset.
That's why in Gapminder World google spreadsheets it is featured with both those names on separate rows in emdat.be data https://docs.google.com/spreadsheets/u/1/d/1EMSP8rthB6yAxj3GtPAcssfP0HHPfujRS0YDPmD1NRY/pub https://docs.google.com/spreadsheets/u/1/d/1_UEhuCQeH5MySwuOKmawjRNeQkwP2vJx0rZb7Wgq2wE/pub#
more should be added as we find them
If you look closely, you can see that there is no overlap of numbers > 0. Somewhere between 2003 and 2007 the emdat.be must've changed names. Fill out Timor-Leste here and you see that combined numbers are correct: http://www.emdat.be/country_profile/index.html
Both names are translated to
tls
when turning Gapminder World to DDFcsv. Therefore, there istls
datapoints for both East Timor and Timor-Leste and thus duplicate keys.How to solve: 1) Update source, combining East Timor and Timor-Leste data to one row 2) Make script smart so it merges the two 3) Keep script dumb and make an exact copy of data as it is: Make sure there are two separate entity ids for East Timor and Timor-Leste. Though this keeps the error in the dataset (not sound).
This has no priority as
ddf--cred--em_dat
should overwrite this data correctly in SG. This would be purely to make this historic dataset valid and sound.