govex / COVID-19

Data analysis and visualizations of daily COVID cases report
MIT License
206 stars 155 forks source link

Vaccinations Data Report Inconsistency #173

Open pedrohforli opened 3 years ago

pedrohforli commented 3 years ago

First, I just would like to thank and congratulate everyone in this repository for the hard work in pulling and consolidating all this different data sources regarding the pandemic.

That being said, as many users right now, most of my interest in this data has been to consume the Vaccinations information (https://github.com/govex/COVID-19/blob/master/data_tables/vaccine_data/raw_data/vaccine_data_us_state_timeline.csv). With that in mind I've found a number of data issues and verified the data against the provided sources, and have compiled an list of problematic regions ( I did not go through all states)

Alaska: Doses administrated do not match the dashboard information (https://www.arcgis.com/apps/opsdashboard/index.html#/84691dc5b0184827af0fd8e4c20034d9)

Alabama: Doses administrated over time do not match any of the historical data points in the dashboard report.

Arizona: 2021-01-20 administrated numbers are far from the reported number https://www.azdhs.gov/documents/preparedness/epidemiology-disease-control/infectious-disease-epidemiology/novel-coronavirus/vaccine-phases.pdf. Also from the sources provided I couldn't find how the second dose separation was calculated from the provided sources.

Colorado: Assuming you are using the CDC data, it doesn't match. Also for some dates doses_admin_total = people_total+ people_total_2nd_dose and for others people_total + 2 * people_total_2nd_dose

Delaware: Doses administrated do not follow the values reported on https://myhealthycommunity.dhss.delaware.gov/locations/state/vaccine-tracker#vaccine_tracker

DC: I think the CSV is updating only the cumulative numbers, but not the daily numbers that are displayed (https://coronavirus.dc.gov/data/vaccination)

Georgia: Numbers do not match dashboard https://dph.georgia.gov/covid-vaccine nor CDC

Hawaii: Historical numbers do not match dashboard report (https://health.hawaii.gov/coronavirusdisease2019/what-you-should-know/current-situation-in-hawaii/#vaccine)

Idaho: Dates of data points have a 2 day lag to the reported data https://public.tableau.com/profile/idaho.division.of.public.health#!/vizhome/COVID-19VaccineDataDashboard/Residence

Iowa: Numbers do not match report https://idph.iowa.gov/Portals/1/userfiles/61/COVID19%20Vaccine%20Administration.pdf

Illinois: Historicals are lagged two days to the report in the dashboard http://www.dph.illinois.gov/covid19/vaccinedata?county=Illinois

Indiana: Most recent number does not match the displayed value - could be a date issue https://www.coronavirus.in.gov/2680.htm

Kansas: Data does not match the dashboard historicals https://www.kansasvaccine.gov/158/Data

Kentucky: Numbers do not match the dashboard https://govstatus.egov.com/ky-covid-vaccine

Massachusetts: Numbers do not match the weekly report values https://www.mass.gov/doc/weekly-covid-19-vaccination-report-january-14-2021/download

Michigan: Data does not match historicals presented on dashboard https://www.michigan.gov/coronavirus/0,9753,7-406-98178_103214_103272-547150--,00.html

Maryland: Historicals do not match numbers on https://coronavirus.maryland.gov/#Vaccine

Maine: Most recent data point does not match https://www.maine.gov/covid19/vaccines

Minnesota: Data does not match dashboard historicals https://mn.gov/covid19/vaccine/data/index.jsp

Missouri: Numbers do not match CDC report

sarabertrandelis commented 3 years ago

Dear @pedrohforli, Thank you very much for your comments and for consuming our collected data. I will try to go potential sources of differences:

Hope this information is useful to understand the data collection logic to this dataset. Also, we will be switching to automated data collection next Tuesday 23rd, with a change in the path and data structure. We hope this helps improve consistency, reliability, and we avoid manual typos.