ictr / covid19-outbreak-simulator

Population-based Forward-time Simulator for the Outbreak of COVID-19
https://ictr.github.io/covid19-outbreak-simulator/
Other
1 stars 3 forks source link

contrib script to re-organize data from report. #6

Open BoPeng opened 4 years ago

BoPeng commented 4 years ago

The summary report we have right now is a two column format with variable name and values, which can be a single number or a list.

It is a bit difficult to analyze such data so let us try to convert it to a tabular format.

Let us assume that we have report pop_quarantine_all.txt (attached) generated from the following command

outbreak_simulator  --popsize 3499 -j1 --rep 100 --handle-symptomatic quarantine_14 1 \
    --logfile pop_quarantine_all.log  \
    --plugin init --seroprevalence 0.1214 --incidence-rate 0.03036 --leadtime any \
    --plugin stat --interval 1 > pop_quarantine_all.txt

pop_quarantine_all.txt

Let us convert it to a .csv file, with time in the first column, and avg_n_active etc as columns. The key steps are

  1. Find the "days", users could potentially specify a starting date so that we can generate table with real dates.
  2. Find columns that can be added in tabular format. Note that some values are in list format, representing values for each replicate simulation, and some columns have "average". The column names for the replicate columns should be similar to n_recorded_0 for replicate 0, n_recorded_1 for replicate 1 etc.

After the table is generated, it can be useful to draw some x-y plots. In this case, the "average" values can be the main lines, 'replicate" values can provide confidence interval and range. Interesting variables can be n_recovered, n_infected, n_active, or their percentage equivalences (incidence rate ...).

Any tool can be used.

BoPeng commented 4 years ago

@wuben2602

I changed the interface a bit to make it easier to use the script as a pipe. I have found a few problems with the script

  1. The first column (time) does not have a header.
  2. Output from python report2csv.py pop_quarantine_all.txt looks wrong.
  3. I see n_second_symptom_on_day_1 and n_third_symptom_on_day_1 in the csv, which should not be there. If it is difficult to judge from header, removing columns with NA might do the job.

Please also test the script with non-1 intervals, e.g. --plugin stat --interval 2 and --plugin stat --interval 0.5.

wuben2602 commented 4 years ago

@BoPeng could you specify what about the output is wrong?

BoPeng commented 4 years ago

Download the test file from the main post, run

python report2csv.py pop_quarantine_all.txt

and at least the last lines look like

,,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,713.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,725.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3499.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3499.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,713.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,725.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.2038,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.2072,,,,,,,,,
185.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,