nytimes / covid-19-data

A repository of data on coronavirus cases and deaths in the U.S.
https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Other
6.99k stars 3.46k forks source link

Question: Is it possible to include college/university data? #445

Closed beckchel closed 3 years ago

beckchel commented 4 years ago

Hello!

Would it be possible to include the data set used to build this article?

https://www.nytimes.com/interactive/2020/us/covid-college-cases-tracker.html

Tracking college cases internally would be hugely helpful to my institution.

Thank you for the consideration!

albertsun commented 4 years ago

Hi @beckchel,

Basically all the data we have and are confident in is already on that page directly.

We're open to the idea of publishing it in some other format or way, but curious to hear more about specific use cases you or others would have for the data.

beckchel commented 4 years ago

Thanks for getting back to me! I work for Michigan State University, and we're trying to compare ourselves to other similar universities in our own internal dashboards. A CSV or other tabular format would be super helpful (with citation of course!).

On Thu, Aug 27, 2020, 2:25 PM Albert Sun notifications@github.com wrote:

Hi @beckchel https://github.com/beckchel,

Basically all the data we have and are confident in is already on that page directly.

We're open to the idea of publishing it in some other format or way, but curious to hear more about specific use cases you or others would have for the data.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nytimes/covid-19-data/issues/445#issuecomment-682115428, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWGYN7MVYQFCVUVY2F7ZXLSC2QHZANCNFSM4QND7PPQ .

tiffehr commented 4 years ago

We do expose the data under the hood for that interactive page, if you want to grab a globally exposed JSON object and filter it to your specifications. That implies scraping our page source, however.

image

beckchel commented 4 years ago

Thank you! I'll send this to my data science counterparts.

On Thu, Aug 27, 2020, 6:26 PM Tiff Fehr notifications@github.com wrote:

We do expose the data under the hood for that interactive page, if you want to grab a globally exposed JSON object and filter it to your specifications. That implies scraping our page source, however.

[image: image] https://user-images.githubusercontent.com/60173/91501002-a89dbb00-e892-11ea-8541-68539938dcd4.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nytimes/covid-19-data/issues/445#issuecomment-682221942, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWGYNYC3GBMVEVV33OHUNTSC3MSHANCNFSM4QND7PPQ .

EllieLockhart commented 4 years ago

I have pretty extensive academic experience and am proficient in scraping - if this is something consistent with the goals of the project, I would be happy to start looking into building some sort of data harvesting framework for universities.

austinvhuang commented 4 years ago

Seconding this request. @albertsun - in regards to use cases it would be useful for modeling and researching local characteristics of outbreaks in conjunction with other data.

jimfenton commented 4 years ago

Also seconding this request. My daughter, a student at CU Boulder, would like to be able to compare her school to others on a per capita basis. Having the data here rather than scraping it would also facilitate analysis of growth rates. It might also be useful to relate this to the date that the school opened for the fall (semester schools and quarter schools are considerably different in this regard).

PatrickMartinK commented 4 years ago

Bumping this request, I'm the director of a student-run think tank at Florida State University and this data would be absolutely crucial to our data analysis efforts. We're trying to compare FSU stats to our peer institutions and state university system, and using this database is the best way to do that. If anyone has figured out how to scrape it, or if a collaborator has compiled it into a CSV, I would greatly appreciate a response.

tiffehr commented 4 years ago

Yep, thank you, all. We are very aware of the interest and we are working on it. There are more challenges to the decision than you might imagine. I'll update this thread once we have a decision either way.

briancpark commented 4 years ago

I am following this thread and also bumping this request. I'm a university student, and it would be beneficial to share well formatted datasets among peers and professors to bring upon more awareness. Some professors at my university have been giving relevant COVID-19 datasets for data science projects and assignments. I would be happy to pass this down to my university community if this dataset does become publicly available. I know trying to formulate a new time series dataset for a specific group is more work, but I really appreciate what NYTimes has been doing so far with other COVID-19 data such as mask usage and county specific US data.

tiffehr commented 4 years ago

Thank you. No need to bump the thread. We are very aware of the interest and inquiries, here on GitHub, via email, Tweets, reader comments, etc.

lwaananenjones commented 3 years ago

Hi everyone, raw data from the college tracker is now available with the most recent update: https://github.com/nytimes/covid-19-data/tree/master/colleges