govex / COVID-19

Data analysis and visualizations of daily COVID cases report
MIT License
206 stars 155 forks source link

question on vaccine time series #12

Open wellsangels opened 3 years ago

wellsangels commented 3 years ago

I have a question about the vaccine time series. For states with data from multiple days, some metrics are not repeated for all days. For example, Texas had doses_alloc_total listed for 12/14 but not for 12/17 or 12/18. Going forward, I'm curious if each cell will be included only if it's updated, or if you'll start filling in the dates with the most recent figure.

date Province_State doses_alloc_total doses_shipped_total people_total
12/17/2020 Texas 91650 4187
12/18/2020 Texas 91650 4187
12/14/2020 Texas 1400000 19500

Thanks very much for providing this excellent data.

StateCenterKid commented 3 years ago

Hello and Thank You for compiling the vaccine data. This site was mentioned in the Johns Hopkins repository today so you be getting a lot of new traffic.

If I may make one suggestion it would be to reconfigure your table from "wide" to "long" by placing each vaccine type on its own row. Doing so now will make it easier to expand the dataset as new vaccines are approved. Taking it one step further, it would also be good to build in a Country Code column to allow for easy integration of regions beyond the US. In short, each record would be identified by 'date', 'country_code', 'Province_State', and 'vaccine name'.

Thank you again for making this data available, I excited to see how this develops.

https://github.com/CSSEGISandData/COVID-19/issues/3475#issue-771044136

isabeste commented 3 years ago

Hello, and thank you for compiling this data! as mentioned above, i also found this data source via the JHU repository. In addition to the comments above, will there be a plan to add anything more granular than state-level information for the covid-19 distribution? As Wellsangels pointed out, it would be great to know how this data will be formatted moving forward. Will the most recent row include total allocations? will each row be defined by date+location+vaccine type?

tatornator12 commented 3 years ago

Thanks for compiling this data! It's extremely helpful.

I do have the same concerns as @wellsangels regarding how the data is being updated. I just ran into this same predicament today with Guam. It would be helpful if the metrics were repeated for each day even if some of those metrics do not change.

sarabertrandelis commented 3 years ago

Hello @wellsangels and @tatornator12. Some data come from States dashboards, and other data from press releases. At the beginning of the data collection, we pulled most of the information from press releases, but now many States already have a dashboard, which may contain different information than the press releases had. We prefer to register all dates even if no information is available for a certain data point. If we have confirmation that a number is not changing with time, we will definitely repeat it, but sometimes the information is just not there anymore.

sarabertrandelis commented 3 years ago

Hello @StateCenterKid. The data collection is currently done manually, and the format of the table makes it easier. We are in the process of automating the data collection, and once it's done, it is very probable that we indeed move towards a format similar to the cases/deaths in the CSSE repository. We are also planning to add new information like doses administration break down by demographics, and international data. Thanks for your comment!

tatornator12 commented 3 years ago

Hey @sarabertrandelis thanks for the info! That certainly does make things a bit complicated if the information is no longer available. I would assume that if there is no reported change for total doses administered from the previous day that it would at least be repeated; at least that's what my main focus is on. In your other comment, you mentioned moving towards automating the process. Does that mean the current schema is here to stay with maybe the additional fields mentioned?

xunhuang commented 3 years ago

Hi thank you for the awesome work. I saw that California's data hasn't been updated for a while. Here is the link that I found that contains allocation(per county) and doses administered.

https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/COVID-19/VaccineDoses.aspx

Is there anyway I can help with this? I run a site https://covid-19.direct and integrated data from your site.

sarabertrandelis commented 3 years ago

@tatornator12 regarding the doses administered, there are two different cases: when the source is updated, and the number has not changed, then we note again the same number for the new date (not common). However, if the source is not updated, that does not mean vaccinations stopped, in that case we hold data updates until a new release of data (most common case). Regarding the transition to an automated data collection, we will definitely try a smooth and minimal change to the current schema, with a transition time maintaining both if there is any difference.

sarabertrandelis commented 3 years ago

@xunhuang thank you so much! We added the source. You will be able to see update here and here within the next hour. And thank you too for using JHU as your trusted data source

ghost commented 3 years ago

@sarabertrandelis Thank you for the info! I have a quick question about the data. I assume all the numbers in this data are cumulative instead of incremental. Is my understanding correct? I couldn't find it in the data dictionary.

sarabertrandelis commented 3 years ago

@YiXu-Takeda it is indeed cumulative. I will add it to the data dictionary to make things more clear. Thank you for the input!

xunhuang commented 3 years ago

Have you guys seen the bloomberg data they collect? the website looks legit. Not sure if they publish the data source directly, but they are embedded on website in straight forward json format.

On Tue, Jan 5, 2021 at 10:56 AM Sara Bertran de Lis < notifications@github.com> wrote:

@YiXu-Takeda https://github.com/YiXu-Takeda it is indeed cumulative. I will add it to the data dictionary to make things more clear. Thank you for the input!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/govex/COVID-19/issues/12#issuecomment-754831670, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFSGF7KKPBGF24M5IKZUH3SYNOEVANCNFSM4VBRJ7DQ .

sarabertrandelis commented 3 years ago

Heads up that we are introducing a small adjustment on how dates are reported tonight. Up to today, we reported the date to mimic the State date reported. However, due to the increasing number of States reporting and the diversity of ways they use to report dates, we are switching to report date of data collection. That also means that when data is stale, because for example a State is not reporting during the weekends, we will still report the figure that is visible in the dashboard even if it was the same to the previous day, providing a number for all days for all States with publicly available data. We hope this change helps increase consistency in our data, and makes things clearer to understand and analyze.

twkreykes commented 3 years ago

Greetings - Does anyone know if there will be a County level source for the vaccine data...similar to the case data that has been published?

reclusivestar commented 3 years ago

Hi All,

I made a timeseries depicting data vaccination from JHU on a US map here https://reclusivestar.github.io/vaccine_tracker/

Let me know what you think! @sarabertrandelis

Thanks, Tanmay Kumar