neherlab / covid19_scenarios

Models of COVID-19 outbreak trajectories and hospital demand
https://covid19-scenarios.org
MIT License
1.36k stars 354 forks source link

🔍 WANTED: We are looking for data and data curators #69

Open ivan-aksamentov opened 4 years ago

ivan-aksamentov commented 4 years ago

Currently we are looking for case counts data and other statistical information from different countries as well as for people who can maintain this data (add, curate, update).

The entire process should be automated as much as possible. The README in the directory covid19_scenarios/data contains some information on how to get started:

https://github.com/neherlab/covid19_scenarios/data

It also contains the preprocessed data ready for the consumption by the build system of the app.

If you think you may know where to find the relevant data for a country, please let us know either in this thread, or open an issue. If you are ready to contribute, feel free to open a pull request.

Don't hesitate to ask if you have any questions or if you need something to get started!

cc @nnoll @rneher

ivan-aksamentov commented 4 years ago

One way might be to crowdsource the search for data.

There are many COVID-19 and SARS-CoV-2-related projects on the web. Some of them may contain data, APIs or just interesting ideas that can help us to make our application better.

Here are some examples:

noleti commented 4 years ago

https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide seems like a good data source? Example data: https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide-2020-03-20.xlsx Data seems to be global and well structured. It only counts cases and deaths, though (no hospitalized, ICU, recovered)

fazouane-marouane commented 4 years ago

Would this be enough ? It’s data that’s refreshed daily at 9am EST https://www.tableau.com/covid-19-coronavirus-data-resources

csv google sheets

nonotest commented 4 years ago

https://coronadatascraper.com/ there's a fair bit of data available there as well

mserranom commented 4 years ago

For Spain, this is a good data source, containing national and regional cases, deaths, ICU and recovered, updated on a daily basis: https://github.com/datadista/datasets/tree/master/COVID%2019

noleti commented 4 years ago

I have a finished pull request for the ECDC dataset pending now, replacing the WHO data and parser.

noleti commented 4 years ago

https://coronadatascraper.com/ there's a fair bit of data available there as well

There is an amazing amount of data on that API, but I guess it is not an official source. Should be easy to write a parser for, if required.

nonotest commented 4 years ago

Good point!

I have checked a few of their scrappers, they all seem to be directed at government pages eg https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert/coronavirus-covid-19-current-situation-and-case-numbers for Australia https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection.html for Canada and so on. or github repos that are official sources like https://github.com/opencovid19-fr/data for France

If we were to go this road it shouldn't take too long to vet each source I guess.

mserranom commented 4 years ago

Spain's data: https://github.com/neherlab/covid19_scenarios_data/pull/11

camjc commented 4 years ago

Hey all,

Have been working on https://coronadatascraper.com/ aka https://github.com/lazd/coronadatascraper in my own time, and also am a Sanity user professionally+personally.

We've been building scrapers over there only from official sources. No news, no aggregates, just governments directly (yes this is a pain since many governments like to have free-text press releases, sometimes with useful numbers written out like thirty-five).

If there are any sources on there that aren't primary sources (government depts), please raise an issue on that github and we'll work to sort it out.

There's a slack for that project too if anyone wants to jump on and chat with us that are working on it.

noleti commented 4 years ago

I wrote a first parser for the coronadatascaper.com now (in my forked repo). In the latest version, it should also contain correct entries for regions such as USA-OK-Love County. Everything is stored in a global .tsv (and json as well).

Re: source quality of coronadatascaper: Germany's numbers are pulled of the app of a tabloid... I don't think it will be possible to vet sources of such an API, as they can change things as they see fit. /edit: To be more precise: Germany's numbers are themselves aggregated, from the official sources (RKI) and newspapers, of which at least one is more or less a tabloid (Morgenpost)

aschelch commented 4 years ago

Hi, thanks you all for the work. Here is my little contribution : I added data for France (https://github.com/neherlab/covid19_scenarios_data/pull/18) Take care

ManuelB commented 4 years ago

I have collected a lot of data for Germany:

https://github.com/ManuelB/covid-19-vis/tree/gh-pages/germany

It is used to run a full simulation for 417 districts in Germany and runs on the command line.

Details what I am doing in described here: https://youtu.be/lwUDvNfVeEo

If the data is integrated into the data repository it would show more than 400 items in the select box. I would think this is too much.

nnoll commented 4 years ago

That's really cool @ManuelB. Thanks for sharing.

ShubhamPandey-Engineer commented 4 years ago

I can provide you an API that gives all country data regarding COVID-19 .It also get updated frequently

fetch('https://corona.lmao.ninja/all')

Hope it will help you guys.

pauloangelo commented 4 years ago

For Brazil, I saw that the data available at https://brasil.io/dataset/covid19/ have been used. Great! However, some data is outdated. For example, today, the last record in "BRA-Distrito Federal.tsv" is for 2020-04-30. Who is working with the Brazilian data? I'm willing to help if needed. In the opportunity, I would like to thank the project's team! We used Covid Scenarios in a publication [1] that had a relevant local repercussion.

[1] https://1b9b1300-1a94-40d8-b9ca-402057f9520f.filesusr.com/ugd/c4c6aa_762877bf2fc54d1e94aa60dd8ea7a074.pdf

noleti commented 4 years ago

For Brazil, I saw that the data available at https://brasil.io/dataset/covid19/ have been used. Great! However, some data is outdated. For example, today, the last record in "BRA-Distrito Federal.tsv" is for 2020-04-30. Who is working with the Brazilian data? I'm willing to help if needed.

Hi @pauloangelo , thanks for highlighting this. The data needs to be update manually by the maintainers of this project, and that has just not been done in the last 3 days. I am sure they will do this soon!

pauloangelo commented 4 years ago

Thank you @noleti . I'm available to help, if needed. Thank you all for this remarkable initiative!

nnoll commented 4 years ago

Hey @pauloangelo, sorry for the delay. We will update the date now and re-release soon!

pauloangelo commented 4 years ago

Thank you @nnoll !

rneher commented 4 years ago

If you compile population sizes for Brazilian regions and their hospital capacities, we can add them as presets.

pauloangelo commented 4 years ago

Hi all,

The counts for "BRA-Distrito Federal" are including the cases from other regions detected at Distrito Federal. The Brasil.io dataset registers external cases as "Importados/Indefinidos". I suggest to count just the local cases. For example, for 29-May-2020 there are 142 local deaths, while the TSV counts 154.

Best regards,

PA

pauloangelo commented 4 years ago

If you compile population sizes for Brazilian regions and their hospital capacities, we can add them as presets.

Hi @rneher , I will have a look at it. For the hospital capacities, unfortunately, we don't have a reliable data. The government are varying this information. For the population sizes, R0, etc, I believe we can provide, at least for "BRA-Distrito Federal". Follows below the link/data that we have been using in our weekly report.

Weekly reports created by our observatory (parameters are also motivated here) https://www.prepidemia.org/boletins-quinzenais-prepidemia .

Link/data for "BRA-Distrito Federal": https://covid19-scenarios.org?q=~%28ageDistributionData~%28data~%28~%28ageGroup~%270-9~population~389784%29~%28ageGroup~%2710-19~population~439454%29~%28ageGroup~%2720-29~population~514225%29~%28ageGroup~%2730-39~population~465517%29~%28ageGroup~%2740-49~population~344853%29~%28ageGroup~%2750-59~population~218714%29~%28ageGroup~%2760-69~population~118042%29~%28ageGroup~%2770-79~population~56949%29~%28ageGroup~%2780%2A2b~population~22622%29%29~name~%27Custom%29~scenarioData~%28data~%28epidemiological~%28hospitalStayDays~8~icuStayDays~10~infectiousPeriodDays~2.2~latencyDays~5.2~overflowSeverity~1~peakMonth~0~r0~%28begin~3.7~end~5.55%29~seasonalForcing~0%29~mitigation~%28mitigationIntervals~%28~%28color~%27%2A23c9f4e5~name~%27D40509~timeRange~%28begin~%272020-03-12T15%2A3a00%2A3a00.000Z~end~%272020-05-31T15%2A3a00%2A3a00.000Z%29~transmissionReduction~%28begin~10~end~10%29%29~%28color~%27%2A23b98d4d~name~%27D40539~timeRange~%28begin~%272020-03-19T15%2A3a00%2A3a00.000Z~end~%272020-04-15T15%2A3a00%2A3a00.000Z%29~transmissionReduction~%28begin~60~end~60%29%29~%28color~%27%2A2332cac5~name~%27H%2Ae1bitos%2A20de%2A20higiene%2A20e%2A20distanciamento~timeRange~%28begin~%272020-03-19T15%2A3a00%2A3a00.000Z~end~%272020-12-31T15%2A3a00%2A3a00.000Z%29~transmissionReduction~%28begin~40~end~40%29%29~%28color~%27%2A230a1ab5~name~%27Impacto%2A20equivalente%2A20ao%2A20atual~timeRange~%28begin~%272020-05-17T15%2A3a00%2A3a00.000Z~end~%272020-12-31T15%2A3a00%2A3a00.000Z%29~transmissionReduction~%28begin~50~end~50%29%29~%28color~%27%2A2339984a~name~%27Impacto%2A20equivalente%2A20ao%2A20D40509~timeRange~%28begin~%272020-05-31T15%2A3a00%2A3a00.000Z~end~%272020-12-31T15%2A3a00%2A3a00.000Z%29~transmissionReduction~%28begin~10~end~10%29%29~%28color~%27%2A2384c772~name~%27D40539%2A20com%2A20flexibiliza%2Ae7%2Af5es~timeRange~%28begin~%272020-04-15T15%2A3a00%2A3a00.000Z~end~%272020-05-10T15%2A3a00%2A3a00.000Z%29~transmissionReduction~%28begin~55~end~55%29%29~%28color~%27%2A2346a750~name~%27D40539%2A20com%2A20mais%2A20flexibiliza%2Ae7%2Af5es~timeRange~%28begin~%272020-05-10T15%2A3a00%2A3a00.000Z~end~%272020-05-17T15%2A3a00%2A3a00.000Z%29~transmissionReduction~%28begin~50~end~50%29%29%29%29~population~%28ageDistributionName~%27Custom~caseCountsName~%27BRA-Distrito%2A20Federal~hospitalBeds~2570160~icuBeds~2570160~importsPerDay~0~initialNumberOfCases~20~populationServed~2570160%29~simulation~%28numberStochasticRuns~20~simulationTimeRange~%28begin~%272020-02-27T15%2A3a00%2A3a00.000Z~end~%272020-12-31T15%2A3a00%2A3a00.000Z%29%29%29~name~%27Distrito%2A20Federal%29~schemaVer~%272.0.0~severityDistributionData~%28data~%28~%28ageGroup~%270-9~confirmed~5~critical~5~fatal~30~isolated~0~severe~1%29~%28ageGroup~%2710-19~confirmed~5~critical~10~fatal~30~isolated~0~severe~3%29~%28ageGroup~%2720-29~confirmed~10~critical~10~fatal~30~isolated~0~severe~3%29~%28ageGroup~%2730-39~confirmed~15~critical~15~fatal~30~isolated~0~severe~3%29~%28ageGroup~%2740-49~confirmed~20~critical~20~fatal~30~isolated~0~severe~6%29~%28ageGroup~%2750-59~confirmed~25~critical~25~fatal~40~isolated~0~severe~10%29~%28ageGroup~%2760-69~confirmed~30~critical~35~fatal~40~isolated~0~severe~25%29~%28ageGroup~%2770-79~confirmed~40~critical~45~fatal~50~isolated~0~severe~35%29~%28ageGroup~%2780%2A2b~confirmed~50~critical~55~fatal~50~isolated~0~severe~50%29%29~name~%27China%2A20CDC%29%29&v=1

ivan-aksamentov commented 4 years ago

@pauloangelo I created a separate issue for this, let's continue there https://github.com/neherlab/covid19_scenarios/issues/718