neherlab / covid19_scenarios_data

Data preprocessing scripts and preprocessed data storage for COVID-19 Scenarios project
https://github.com/neherlab/covid19_scenarios
Other
41 stars 34 forks source link
coronavirus covid covid-19 data hospital model modelling ncov neherlab open-source opensource outbreak population research sars-cov-2 scenarios science simulation ventilator

NOTE: This repo has been moved directly within covid19-scenarios. Please continue the discussion there

COVID-19 Scenarios Data

Data preprocessing scripts and preprocessed data storage for COVID-19 Scenarios project

License GitHub commit activity GitHub contributors GitHub last commit

Join the community on Spectrum Contributions: welcome Discuss: in issue 18

Twitter Follow

Got questions or suggestions?

Image for the link to join the chat

Discover

Simulator Source code repository Data repository Updates
Image with app logo and text 'Try' Image with GutHub logo and text 'Get Involved' Image with GutHub logo and text 'Add Data' Image with Twitter logo and text 'Follow'

Overview

This repository serves as the source of observational data for covid19_scenarios. It ingests data from a variety of sources listed in sources.json. For each source there is a parser written in python in the directory parsers. The data is stored as tsv files (tab separated values) for each location or country. These tabular files are mainly meant to enable data curation and storage, while the web application needs json files as input.

The following commands assume that you have cloned this repository as covid19_scenarios_data and run these commands from outside this repository. To run the parsers, call

python3 covid19_scenarios_data/generate_data.py --fetch

This will update the tables in the directory case-counts. For each parser there is a separate directory which contains individual case counts for each location covered by the parser.

To only run specific parsers, run

python3 covid19_scenarios_data/generate_data.py --fetch --parsers netherlands switzerland

To generate jsons for the app, specific the path the location of the target. This can either be done in combination with updating the tsv files or separately depending on whether the command is run with --fetch or not.

python3 covid19_scenarios_data/generate_data.py \
        --output-cases path/case-counts.json  \
        --output-population path/population.json

To generate the integrated scenario json, run

python3 covid19_scenarios_data/generate_data.py \
        --output-cases path/case-counts.json  \
        --output-scenarios path/scenarios.json

Contents

Country codes

List of countries associated to regions, subregions, and three letter codes supplied by the U.N.

Population data

List of settings used by the default scenario by COVID-19 epidemic simulation for different regions of interest.

Case count data

Within the directory ./case-counts is a structured set of tsv files containing aggregated data for select country and subregion/city. We welcome contributions to keep this data up to date. The format chosen is:

time    cases   deaths   hospitalized    ICU     recovered
2020-03-14 ...

We are actively looking for people to supply data to be used for our modeling!

Contributing and curating data:

Adding parser or case count data for a new region:

The steps to follow are:

Identify a source for case counts data that is updated frequently (at least daily) as outbreak evolves.
Update the sources.json file to contain all relevant metadata.
Test your parser and create a Pull Request
python3 covid19_scenarios_data/generate_data.py --fetch --parsers <yourparsername>
Add populations data for the additional regions/states.

Case count data is most useful when tied to data on the population it refers to. To ensure new case counts are correctly included in the population presets, add a line to the populationData.tsv for each new region (see Adding/editing population data for a country and/or region below).

Updating/editing case count data for the existing region:

We note that this option is not preferred relative to a script that automatically updates as outlined above. However, if there is no accessible data sources, one can manually enter the data. To do so

Commit a manually entered file into the "manuals" directory

Adding/editing population data for a country and/or region:

As of now all data used to initialize scenarios used by our model is found within populationData.tsv It has the following form:

name    populationServed    ageDistribution hospitalBeds    ICUBeds suspectedCaseMarch1st   importsPerDay   hemisphere
Switzerland ...

At least one of suspectedCasesMarch1st and importsPerDay needs to be non-zero. Otherwise there is no outbreak (good news in principle, but not useful for exploring scenarios).

License

Mixed