Open pedrohforli opened 3 years ago
Dear @pedrohforli, Thank you very much for your comments and for consuming our collected data. I will try to go potential sources of differences:
Hope this information is useful to understand the data collection logic to this dataset. Also, we will be switching to automated data collection next Tuesday 23rd, with a change in the path and data structure. We hope this helps improve consistency, reliability, and we avoid manual typos.
First, I just would like to thank and congratulate everyone in this repository for the hard work in pulling and consolidating all this different data sources regarding the pandemic.
That being said, as many users right now, most of my interest in this data has been to consume the Vaccinations information (https://github.com/govex/COVID-19/blob/master/data_tables/vaccine_data/raw_data/vaccine_data_us_state_timeline.csv). With that in mind I've found a number of data issues and verified the data against the provided sources, and have compiled an list of problematic regions ( I did not go through all states)
Alaska: Doses administrated do not match the dashboard information (https://www.arcgis.com/apps/opsdashboard/index.html#/84691dc5b0184827af0fd8e4c20034d9)
Alabama: Doses administrated over time do not match any of the historical data points in the dashboard report.
Arizona: 2021-01-20 administrated numbers are far from the reported number https://www.azdhs.gov/documents/preparedness/epidemiology-disease-control/infectious-disease-epidemiology/novel-coronavirus/vaccine-phases.pdf. Also from the sources provided I couldn't find how the second dose separation was calculated from the provided sources.
Colorado: Assuming you are using the CDC data, it doesn't match. Also for some dates doses_admin_total = people_total+ people_total_2nd_dose and for others people_total + 2 * people_total_2nd_dose
Delaware: Doses administrated do not follow the values reported on https://myhealthycommunity.dhss.delaware.gov/locations/state/vaccine-tracker#vaccine_tracker
DC: I think the CSV is updating only the cumulative numbers, but not the daily numbers that are displayed (https://coronavirus.dc.gov/data/vaccination)
Georgia: Numbers do not match dashboard https://dph.georgia.gov/covid-vaccine nor CDC
Hawaii: Historical numbers do not match dashboard report (https://health.hawaii.gov/coronavirusdisease2019/what-you-should-know/current-situation-in-hawaii/#vaccine)
Idaho: Dates of data points have a 2 day lag to the reported data https://public.tableau.com/profile/idaho.division.of.public.health#!/vizhome/COVID-19VaccineDataDashboard/Residence
Iowa: Numbers do not match report https://idph.iowa.gov/Portals/1/userfiles/61/COVID19%20Vaccine%20Administration.pdf
Illinois: Historicals are lagged two days to the report in the dashboard http://www.dph.illinois.gov/covid19/vaccinedata?county=Illinois
Indiana: Most recent number does not match the displayed value - could be a date issue https://www.coronavirus.in.gov/2680.htm
Kansas: Data does not match the dashboard historicals https://www.kansasvaccine.gov/158/Data
Kentucky: Numbers do not match the dashboard https://govstatus.egov.com/ky-covid-vaccine
Massachusetts: Numbers do not match the weekly report values https://www.mass.gov/doc/weekly-covid-19-vaccination-report-january-14-2021/download
Michigan: Data does not match historicals presented on dashboard https://www.michigan.gov/coronavirus/0,9753,7-406-98178_103214_103272-547150--,00.html
Maryland: Historicals do not match numbers on https://coronavirus.maryland.gov/#Vaccine
Maine: Most recent data point does not match https://www.maine.gov/covid19/vaccines
Minnesota: Data does not match dashboard historicals https://mn.gov/covid19/vaccine/data/index.jsp
Missouri: Numbers do not match CDC report