owid / covid-19-data

Data on COVID-19 (coronavirus) cases, deaths, hospitalizations, tests • All countries • Updated daily by Our World in Data
https://ourworldindata.org/coronavirus
5.66k stars 3.64k forks source link

data(vax): boosters data missing #2430

Closed fibke closed 2 years ago

fibke commented 2 years ago

Would be good to double-check against national health authority databases to confirm whether booster data are indeed NA as reported in OWID. It appears many countries have started boosting.

Currently OWID reports NA for 85 countries

DONE

How to use this thread

Comment with the country you have examined and your conclusions. If applicable, state if you are taking the task to complete it. @lucasrodes will update this list to reflect your changes/updates.


Edit based on an original comment by @Rouein

fibke commented 2 years ago

Note that for missing countries in LAC, the PAHO website contains a lot of missing booster data:

https://ais.paho.org/imm/IM_DosisAdmin-Vacunacion.asp

lucasrodes commented 2 years ago

@Rouein Thanks for editing the original post, I am editing @fibke's comment based on your edit (will remove your comment, therefore).

lucasrodes commented 2 years ago

Regarding the data for most African countries:

Therefore, I propose checking again the source Africa CDC to see if it can be used for certain countries. The benefit of Africa CDC is that it has data on boosters. A preliminary analysis shows that some countries may have started administering booster doses:

df.loc[df['Booster'] != 0, ["CtryUPPERC", "Booster"]]
               CtryUPPERC  Booster
8   SAO TOME AND PRINCIPE     2945
13                  KENYA   198066
15              MAURITIUS   380398
16                 RWANDA  1175885
17             SEYCHELLES    27744
23                ALGERIA   361690
24                  EGYPT   664713
25                  LIBYA    36661
26             MAURITANIA    27309
27                MOROCCO  4862819
29                TUNISIA  1058234
30                 ANGOLA   138204
31               BOTSWANA    39907
35             MOZAMBIQUE    67239
37           SOUTH AFRICA   704148
38                 ZAMBIA    18974
39               ZIMBABWE    72829
45                  GHANA    67104

Again, to use this source we should check the issues mentioned in this commit no longer occur.

lucasrodes commented 2 years ago

@Rouein, I can't find data for Kyrgyzstan in the source that you cite

lucasrodes commented 2 years ago

Since they are only using 2dose vaccines: total_boosters = total_vaccinations - people_vaccinated - people_fully_vaccinated

I would lean towards getting booster doses once it is explicitly stated. Otherwise, we may run into some issues if they were to add single doses. It may not scale well if we do this for all countries and might be difficult to track.

BTW, what is the difference between people_vaccinated and people_partly_vaccinated?

partly vaccinated refers to the number of people with just one dose, and people vaccinated refers to people with at least one dose.

fibke commented 2 years ago

Re your comment on Vietnam: "Vietnam: Boosters available but (abdala being a 3 dose vaccine) booster numbers can not be extracted (@Rouein)"

The 3rd doses of Abdala are actually available. Not easy to find but here it is:

Page 4, footnote 3: 1,439,428 in the pdf file shown here: https://luatvietnam.vn/y-te/bao-cao-191-bc-byt-2022-tinh-hinh-dich-va-cong-tac-phong-chong-dich-covid-19-ngay-14-02-2022-216856-d6.html

You may be able to find this document in English somewhere.

Screen Shot 2022-02-16 at 11 50 43 AM Screen Shot 2022-02-16 at 11 50 31 AM
fibke commented 2 years ago

Regarding the data for most African countries:

  • We used to source the data from Africa CDC. Specifically from this file. You can easily get this file as a DataFrame using our module:
    from cowidev.vax.incremental.afriacdc import AfricaCDC
    df = Africacdc().read()
  • We no longer rely on Africa CDC because of some issues in how they report the data (more details on the reasons here). Instead, we switched to WHO, which does not report boosters.

Therefore, I propose checking again the source Africa CDC to see if it can be used for certain countries. The benefit of Africa CDC is that it has data on boosters. A preliminary analysis shows that some countries may have started administering booster doses:

df.loc[df['Booster'] != 0, ["CtryUPPERC", "Booster"]]
               CtryUPPERC  Booster
8   SAO TOME AND PRINCIPE     2945
13                  KENYA   198066
15              MAURITIUS   380398
16                 RWANDA  1175885
17             SEYCHELLES    27744
23                ALGERIA   361690
24                  EGYPT   664713
25                  LIBYA    36661
26             MAURITANIA    27309
27                MOROCCO  4862819
29                TUNISIA  1058234
30                 ANGOLA   138204
31               BOTSWANA    39907
35             MOZAMBIQUE    67239
37           SOUTH AFRICA   704148
38                 ZAMBIA    18974
39               ZIMBABWE    72829
45                  GHANA    67104

Again, to use this source we should check the issues mentioned in this commit no longer occur.

Would be important that these latest data on boosters in Africa are incorporated. As of now, it looks like much of Africa hasn't started boosting yet, which is clearly not the case and amplifies the dire picture of vaccine inequity.

fibke commented 2 years ago

It takes a manual effort to find the data but scraping it should not be too hard. I have implemented for other data points myself.

Vietnam is a big country so would be good to get the info correctly shown.

On Feb 16, 2022, at 11:57, Rouein @.***> wrote:

 Hi @fibke This requires manual effort and is not something you can automate with a scrapper. In that case, just acquiring the data from WHO is more feasible.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.

fibke commented 2 years ago

I see. Thought u were someone actually working at OWID.

On Feb 16, 2022, at 12:18, Rouein @.***> wrote:

 @.*** , I am just a contributor like you, so I can try to write a script and make a PR but the final decision is with the maintainers. So in any case don't count on my answers and wait for one of the maintainers to reply to you.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.

fibke commented 2 years ago

That’s too bad. Thanks for checking.

On Feb 16, 2022, at 13:50, Rouein @.***> wrote:

 BTW @fibke, I just ran a test script on 2 of the PDF files from here and it seems that the numbers and some of the text are not "convertible to text" and they need OCR to be machine readable. Thus, i wouldn't be able to scrape any data from them.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.

fibke commented 2 years ago

Splendid!

On Feb 17, 2022, at 13:51, Rouein @.***> wrote:

 I just checked the latest report for Vietnam and it seems they have fixed the PDFS and now it is machine readable. I wait to check if one or two reports after this are "readable" as well. if so, I will modify the current script to get the boosters and fully vaccinated data.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.

fibke commented 2 years ago

Booster entries at OWID for Venezuela are out-of-date. The PAHO website contains the most current information. Boosters can be calculated as 1st additional dose + 2nd additional dose (currently 0): https://ais.paho.org/imm/IM_DosisAdmin-Vacunacion.asp

Note: additional doses include boosters but also doses that extend the primary series for immunocompromised people.

lucasrodes commented 2 years ago

Hi @fibke, Just fixed that, data should be live in the next 24 hours!

lucasrodes commented 2 years ago

I got an email back from the SPC saying that they will be incorporating booster data in the upcoming weeks.

lucasrodes commented 2 years ago

All tasks reviewed and done.

Major changes may occur once the WHO and the SPC release booster data (expected in the upcoming weeks)