nytimes / covid-19-data

A repository of data on coronavirus cases and deaths in the U.S.
https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Other
6.99k stars 3.46k forks source link

CSV vs. Endpoint #44

Closed kanyesthaker closed 1 year ago

kanyesthaker commented 4 years ago

Hello! Thank you so much for making this resource available; getting this data in the hands of the public is super important so that we can all collectively contribute in our own ways.

I was wondering if it's possible to get an HTTP endpoint for this data for use in live applications. Right now, I believe the GitHub Contents REST API v3 gives us the ability to fetch file information from a repo, but it would be significantly more convenient if there was an API to which we could pass individual parameters (i.e. county name, state name) and get specific, filtered data (with a last updated timestamp).

Thank you!

samjgorman commented 4 years ago

Hey-- definitely agree with that. An endpoint may enable more compelling applications with this data-- especially applications that rely on regularly checking + updating this data. Right now, it's difficult to employ this data in these use cases, as there's no ability to routinely check and reliably update data served.

digiphilo commented 4 years ago

In the interim I've setup https://covapi.herokuapp.com/ , let me know if there are any usage issues / licensing requirements, currently it is fetching the latest csv from master on container startup, source is here https://github.com/digiphilo/covid/tree/master/cv_api

TomHAnderson commented 4 years ago

I thought about this too but I would preprocess the data into a more normal form with a states db and county db referencing state and each with their own statistics then serve it via HAL with full query ability e.g.https://github.com/zfcampus/zf-doctrine-querybuilder#orm-and-odm served as both HAL JSON and GraphQL.

I suppose there's no way to measure demand except to create it.

daveroush commented 4 years ago

This would be great. There is quite a delay in case counts in the data vs. what State Governments are reporting. We need more timely data. For example: New Mexico reports 191 total cases through 3/26/20. The dataset reports 136.

eventamplify commented 4 years ago

In the interim I've setup https://covapi.herokuapp.com/ , let me know if there are any usage issues / licensing requirements, currently it is fetching the latest csv from master on container startup, source is here https://github.com/digiphilo/covid/tree/master/cv_api

@digiphilo How often are you refreshing the data? I would love to use this API if ok.

digiphilo commented 4 years ago

Data is re-hydrated hourly @eventamplify - please make use of it. Let me know if there are any issues also and we can add some additional dynos.

Li357 commented 4 years ago

@TomHAnderson just whipped up a GraphQL API for the data which also joins with US Census population data for counties and states: https://github.com/Li357/covid-nyt-api. It's live at https://covid-nyt-api.now.sh/graphql

I'm still working on handling the geographic exceptions mentioned by NYT but it has some basic functionality querying the data.

simonw commented 4 years ago

I'm running a project which exposes this data (plus the Johns Hopkins and LA Times data) as a SQL-enabled JSON API over here: https://covid-19.datasettes.com/

Here's an example query showing the percentage of total deaths attributable to each state: https://covid-19.datasettes.com/covid?sql=select+rowid%2C+date%2C+state%2C+fips%2C+cases%2C+deaths%2C+sum(deaths)+over()+as+total_deaths%2C+round(100.0+*+deaths+%2F+sum(deaths)+over()%2C+2)+as+pct+from+ny_times_us_states+where+date+%3D+(select+max(date)+from+ny_times_us_states)+order+by+pct+desc

And here's that same data as JSON (with CORS headers): https://covid-19.datasettes.com/covid.json?sql=select+rowid%2C+date%2C+state%2C+fips%2C+cases%2C+deaths%2C+sum(deaths)+over()+as+total_deaths%2C+round(100.0+*+deaths+%2F+sum(deaths)+over()%2C+2)+as+pct+from+ny_times_us_states+where+date+%3D+(select+max(date)+from+ny_times_us_states)+order+by+pct+desc&_shape=array

I wrote a bit more about my project here: https://simonwillison.net/2020/Mar/11/covid-19/

It updates hourly from this repo (and others) using a scheduled GitHub Action: https://github.com/simonw/covid-19-datasette/blob/master/.github/workflows/scheduled.yml