owid / covid-19-data

Data on COVID-19 (coronavirus) cases, deaths, hospitalizations, tests • All countries • Updated daily by Our World in Data
https://ourworldindata.org/coronavirus
5.66k stars 3.64k forks source link

Calculation of people_vaccinated and people_fully_vaccinated in the absence of official data #391

Closed zizwiz closed 3 years ago

zizwiz commented 3 years ago

We were getting the old data for England and others as (showing just the line with issue, note zero at end)

England,2020-12-20,Pfizer/BioNTech,https://coronavirus.data.gov.uk/,568044,568044,0

Today the data comes back incomplete as (note missing zero at the end)

England,2020-12-20,Pfizer/BioNTech,https://coronavirus.data.gov.uk/,568044,568044,

Is this an error or have you changed it?

kokes commented 3 years ago

This seems to be a slight change in methodology - it's no longer evident how to estimate people (fully) vaccinated if we don't have the data for it - so instead of estimating zero, it's N/A now. This will probably require downloading more detailed data for these countries (I noticed this elsewhere as well).

One could argue that we could still estimate these values to be zero for the early phase. If we don't have dose info at all, we'll end up with 21 zeroes and then NAs (weird, but passable, also exceedingly rare as more countries publish data), otherwise it just gets padded with zeroes like it used to.

https://github.com/owid/covid-19-data/commit/aacb5d5ff2ef9b02b2cf48132522c2a90441fa0c

edomt commented 3 years ago

Hi @zizwiz & @kokes

There's been a few changes made and back-and-forth feedback on this issue between our team and some of the organizations that rely on our data.

We've now decided (as of https://github.com/owid/covid-19-data/commit/012ccc4fa830bd23b01e3d678d0185040f9f1a2e) that we'll make as few assumptions as possible about people_vaccinated and people_fully_vaccinated. This means that until a country clearly reports how many people have received a first/second dose, we'll only report data in total_vaccinations, and leave people_vaccinated and people_fully_vaccinated fully blank.

kokes commented 3 years ago

Cool, that does sound rational. One of the most popular Czech media outlets used the people_vaccinated metric even before we had data for it (and thus reported doses instead, overestimating it), so this change makes the data more realistic.

This presented an issue earlier today, when their dashboard stopped showing data for ~8 EU countries, because they suddenly didn't have people_vaccinated.

I think the best course of action is to be mindful of that missing data and see if we can complete it. I already found out that one of the countries started publishing more detailed data (Austria, reported in a different issue), so this issue will get resolved over time.

FYI, the list of countries with no people info is: Austria, Bahrain, China, Estonia, Ireland, Kuwait, Latvia, Malta, Netherlands, Russia, Saudi, Sweden, Switzerland

``` >>> df = pd.read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv") >>> byc = df.groupby("location").agg({"people_vaccinated": "max", "people_fully_vaccinated": "max"}) >>> byc.loc[byc.isna().all(axis=1)] people_vaccinated people_fully_vaccinated location Austria NaN NaN Bahrain NaN NaN China NaN NaN Estonia NaN NaN Ireland NaN NaN Kuwait NaN NaN Latvia NaN NaN Malta NaN NaN Netherlands NaN NaN Russia NaN NaN Saudi Arabia NaN NaN Sweden NaN NaN Switzerland NaN NaN ```
edomt commented 3 years ago

Thanks @kokes, very useful. Austria and Sweden will re-appear in our dataset based on the new data just released today.

zizwiz commented 3 years ago

Thanks I know this is not easy and I am just using your data in my code. Is there somewhere that you write this type of change that I can get a heads up you are changing things? I know the data is not easy and in one set I get from another site we sometimes get people Undead : todays total dead less than yesterdays.

I will need to write a catch for 0, N/A and False etc. Thanks for providing data.

edomt commented 3 years ago

There is a changelog here but we only use it for major updates and format changes, otherwise it would get way too long. But I try as much as possible to warn of technical changes here in the issues.

zizwiz commented 3 years ago

Thanks