Open ivan-aksamentov opened 4 years ago
One way might be to crowdsource the search for data.
There are many COVID-19 and SARS-CoV-2-related projects on the web. Some of them may contain data, APIs or just interesting ideas that can help us to make our application better.
Here are some examples:
https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide seems like a good data source? Example data: https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide-2020-03-20.xlsx Data seems to be global and well structured. It only counts cases and deaths, though (no hospitalized, ICU, recovered)
Would this be enough ? It’s data that’s refreshed daily at 9am EST https://www.tableau.com/covid-19-coronavirus-data-resources
https://coronadatascraper.com/ there's a fair bit of data available there as well
For Spain, this is a good data source, containing national and regional cases, deaths, ICU and recovered, updated on a daily basis: https://github.com/datadista/datasets/tree/master/COVID%2019
I have a finished pull request for the ECDC dataset pending now, replacing the WHO data and parser.
https://coronadatascraper.com/ there's a fair bit of data available there as well
There is an amazing amount of data on that API, but I guess it is not an official source. Should be easy to write a parser for, if required.
Good point!
I have checked a few of their scrappers, they all seem to be directed at government pages eg https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert/coronavirus-covid-19-current-situation-and-case-numbers for Australia https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection.html for Canada and so on. or github repos that are official sources like https://github.com/opencovid19-fr/data for France
If we were to go this road it shouldn't take too long to vet each source I guess.
Hey all,
Have been working on https://coronadatascraper.com/ aka https://github.com/lazd/coronadatascraper in my own time, and also am a Sanity user professionally+personally.
We've been building scrapers over there only from official sources. No news, no aggregates, just governments directly (yes this is a pain since many governments like to have free-text press releases, sometimes with useful numbers written out like thirty-five
).
If there are any sources on there that aren't primary sources (government depts), please raise an issue on that github and we'll work to sort it out.
There's a slack for that project too if anyone wants to jump on and chat with us that are working on it.
I wrote a first parser for the coronadatascaper.com now (in my forked repo). In the latest version, it should also contain correct entries for regions such as USA-OK-Love County. Everything is stored in a global .tsv (and json as well).
Re: source quality of coronadatascaper: Germany's numbers are pulled of the app of a tabloid... I don't think it will be possible to vet sources of such an API, as they can change things as they see fit. /edit: To be more precise: Germany's numbers are themselves aggregated, from the official sources (RKI) and newspapers, of which at least one is more or less a tabloid (Morgenpost)
Hi, thanks you all for the work. Here is my little contribution : I added data for France (https://github.com/neherlab/covid19_scenarios_data/pull/18) Take care
I have collected a lot of data for Germany:
https://github.com/ManuelB/covid-19-vis/tree/gh-pages/germany
It is used to run a full simulation for 417 districts in Germany and runs on the command line.
Details what I am doing in described here: https://youtu.be/lwUDvNfVeEo
If the data is integrated into the data repository it would show more than 400 items in the select box. I would think this is too much.
That's really cool @ManuelB. Thanks for sharing.
I can provide you an API that gives all country data regarding COVID-19 .It also get updated frequently
fetch('https://corona.lmao.ninja/all')
Hope it will help you guys.
For Brazil, I saw that the data available at https://brasil.io/dataset/covid19/ have been used. Great! However, some data is outdated. For example, today, the last record in "BRA-Distrito Federal.tsv" is for 2020-04-30. Who is working with the Brazilian data? I'm willing to help if needed. In the opportunity, I would like to thank the project's team! We used Covid Scenarios in a publication [1] that had a relevant local repercussion.
For Brazil, I saw that the data available at https://brasil.io/dataset/covid19/ have been used. Great! However, some data is outdated. For example, today, the last record in "BRA-Distrito Federal.tsv" is for 2020-04-30. Who is working with the Brazilian data? I'm willing to help if needed.
Hi @pauloangelo , thanks for highlighting this. The data needs to be update manually by the maintainers of this project, and that has just not been done in the last 3 days. I am sure they will do this soon!
Thank you @noleti . I'm available to help, if needed. Thank you all for this remarkable initiative!
Hey @pauloangelo, sorry for the delay. We will update the date now and re-release soon!
Thank you @nnoll !
If you compile population sizes for Brazilian regions and their hospital capacities, we can add them as presets.
Hi all,
The counts for "BRA-Distrito Federal" are including the cases from other regions detected at Distrito Federal. The Brasil.io dataset registers external cases as "Importados/Indefinidos". I suggest to count just the local cases. For example, for 29-May-2020 there are 142 local deaths, while the TSV counts 154.
Best regards,
PA
If you compile population sizes for Brazilian regions and their hospital capacities, we can add them as presets.
Hi @rneher , I will have a look at it. For the hospital capacities, unfortunately, we don't have a reliable data. The government are varying this information. For the population sizes, R0, etc, I believe we can provide, at least for "BRA-Distrito Federal". Follows below the link/data that we have been using in our weekly report.
Weekly reports created by our observatory (parameters are also motivated here) https://www.prepidemia.org/boletins-quinzenais-prepidemia .
@pauloangelo I created a separate issue for this, let's continue there https://github.com/neherlab/covid19_scenarios/issues/718
Currently we are looking for case counts data and other statistical information from different countries as well as for people who can maintain this data (add, curate, update).
The entire process should be automated as much as possible. The README in the directory
covid19_scenarios/data
contains some information on how to get started:https://github.com/neherlab/covid19_scenarios/data
It also contains the preprocessed data ready for the consumption by the build system of the app.
If you think you may know where to find the relevant data for a country, please let us know either in this thread, or open an issue. If you are ready to contribute, feel free to open a pull request.
Don't hesitate to ask if you have any questions or if you need something to get started!
cc @nnoll @rneher