owid / covid-19-data

Data on COVID-19 (coronavirus) cases, deaths, hospitalizations, tests • All countries • Updated daily by Our World in Data
https://ourworldindata.org/coronavirus
5.66k stars 3.64k forks source link

2021 population estimates used #2041

Closed fibke closed 2 years ago

fibke commented 2 years ago

Hi @edomt,

I've been trying to replicate the OWID results for the global vaccination rate and I arrive at a similar number. But there is still a difference and since I am using your vaccine data, I suspect it must be in the calculation of the population numbers.

The following are among the likely culprits:

1) WPP mentions that the population estimate for the US excludes "ASM", "GUM", "MNP", "PRI", "VIR". Did you add the populations for these locations to the US population total so you can compare them with the vaccine data that includes them? The difference is around 3 million.

2) Are the vaccination numbers for Cyprus attributed to Greece and Turkey respectively? How is its population counted? The WPP mentions the number for Cyprus is for the country as a whole. Since OWID doesn't list Cyprus separately, do the population numbers for Greece and Turkey include it (WPP doesn't)?

3) There seems to be an issue with Kosovo and Serbia. Note that the UN counts the Kosovo data as part of Serbia. Since COVID lists Kosovo separate alongside Serbia, need to either use just Serbia WPP number or use Kosovo population estimate and subtract it from Serbia's WPP number. May have counted Kosovo twice?

4) Greenland does not provide separate vaccine data, but it should be part of the population estimate for Denmark. WPP mentions Denmark data is ex GRL and FRO.

5) The number for Malta is different from the 2021 number in WPP.

Some of these produce only minor differences, but in the aggregate they add up and regardless would be nice to be able to arrive at the same numbers. Would it be possible to have a look at this?

Thanks, Philip

edomt commented 2 years ago

Hi @fibke

I'll have to investigate 2-3-4 next week, but here are already a few things:

(1) When calculating per-capita metrics for the US, we sum up the UNWPP estimate for the United States with the estimates of American Samoa, Micronesia, Guam, Marshall Islands, Northern Mariana Islands, Puerto Rico, Palau, and the United States Virgin Islands. (See #1983)

(2) Can you confirm that you mean Northern Cyprus? Cyprus itself has its own data.

(5) That's deliberate, see b82a7cb189924a8eb4f6ca9a9c0728a6442f98f5

fibke commented 2 years ago

Thanks, @edomt.

On (2):

I hadn't initially noticed you have OWID_CYN. Turns out the data for Northern Cyprus is missing after August 15. Seems there is more recent info that could be incorporated.

Judging from the population variable in the OWID database, it seems that you treat CYP as just the Greek part. Normally CYP accounts for the whole island as in WPP and here. Did you adjust the population down so that you have a correspondence with the vaccine data which may be reported separately for OWID_CYN and CYP?

Would be interested to know how you treat OWID_CYN and CYP in terms of both the numerator (vaccines) and denominator (population) for the vaccination rate.

edomt commented 2 years ago

(2) Data for the Republic of Cyprus (CYP) comes from https://www.moh.gov.cy/moh/moh.nsf/All/0EFA027144C9E54AC22586BE0032B2F5. As far as we know, this includes only vaccinations performed in CYP, and not those performed in Northern Cyprus (OWID_CYN).

Data for Northern Cyprus comes from https://asi.saglik.gov.ct.tr/. As far we know, despite the ongoing territorial dispute, neither CYP nor Turkey are including the vaccinations performed in Northern Cyprus in their own data. Therefore, we count them separately as OWID_CYN, and all should be fine in terms of numerator data.

For denominators, the UNWPP data estimates the population of Cyprus at 1215588, by including Northern Cyprus, which we don't want given our numerator data. Instead, we use:

edomt commented 2 years ago

(4) I'm not sure why you wrote that "Greenland does not provide separate vaccine data": we've always collected data for it separately, as reported on https://corona.nun.gl/en/.

As far as we know, the data reported by the Danish government does not include those vaccinations.

Therefore, we consider them separately, and given that the UNWPP data also includes separate estimates for Greenland and Denmark, the situation should be okay.

(The same goes for the Faeroe Islands.)

edomt commented 2 years ago

(3) This is already what we do. You can see our population data here: https://github.com/owid/covid-19-data/blob/master/scripts/input/un/population_2020.csv

The total population for Serbia in UNWPP is 8697547, but instead we use a population of:

Both estimates come from the World Bank data.

fibke commented 2 years ago

Many thanks for the clarifications, @edomt