This repository provides a Python implementation of the dynamic survival analysis method. A brief description of the methodology can be found in this white paper. Prof. Greg Rempała gave a couple of public talks on this model. You can watch his MBI seminar talks here: link to his first talk and link to his second talk.
This is primarily based on a package (available here) developed by Caleb Deen Bastian, Princeton University. I also acknowledge Saket Gurukar, who helped with the parallelization of some of the routines.
If you have questions, comments, criticisms, or corrections, please email me at khudabukhsh.2@osu.edu.
git clone https://github.com/wasiur/dynamic_survival_analysis.git
from your terminal.
conda env create -f environment.yml
In order to check if the environment is now available, run
conda env list
conda activate dynamic_survival_analysis
or
source activate dynamic_survival_analysis
from your terminal.
A typical input data to the model should have following seven columns:
time | daily_confirm | recovery | deaths | cum_confirm | cum_heal | cum_dead |
---|---|---|---|---|---|---|
2020-03-01 | 2 | 0 | 0 | 2 | 0 | 0 |
2020-03-02 | 8 | 1 | 0 | 10 | 1 | 0 |
. | . | . | . | . | . | . |
. | . | . | . | . | . | . |
. | . | . | . | . | . | . |
2020-06-05 | 46 | 13 | 21 | 63291 | 1200 | 1037 |
At least one of daily_confirm
and cum_confirm
must be present. If the parameters corresponding to the recovery distribution need to be estimated, at least of the four recovery
, deaths
, cum_heal
, and cum_dead
must be present in the dataset.
If no recovery information is available, the model can be still run by explicitly providing the -r
option.
We used COVID-19 data published by the New York Times to inform our model. The repository can be accessed here.
The python scripts allow a number of options. The most important option is -d
, which is used to pass the name of the data file to the python script. If no dataset is present, the model can be run on dummy data by providing the -v
or --verbose
option, which makes the script enter a verbose mode. If neither -d
nor -v
is provided, the script will throw an error.
Fore more details on the options provided, run python DSA.py -h
or python DSA.py --help
. For instance, a run of python DSA_Bayesian.py -h
yields
Usage: python DSA_Bayesian.py -d <datafile>
Options:
-h, --help show this help message and exit
-d DATAFILE, --data-file=DATAFILE
Name of the data file.
-l LOCATION, --location=LOCATION
Name of the location.
-m, --mpi Indicates whether to use MPI for parallelization.
-o OUTPUT_FOLDER, --output-folder=OUTPUT_FOLDER
Name of the output folder
-s, --smooth Indicates whether the daily counts should be smoothed.
-f LAST_DATE, --final-date=LAST_DATE
Last day of data to be used
-r, --estimate-recovery-parameters
Indicates the parameters of the recovery distribution
will be estimated
-N N Size of the random sample
-T T, --T=T End of observation time
--day-zero=DAY0 Date of onset of the epidemic
--niter=NITER Number of iterations of the MCMC
--threads=THREADS Number of threads for MPI
-v, --verbose Runs with default choices
The easiest way to run our model is to open one of the Jupyter notebooks and run the cells. Please modify the commands as needed.
Alternatively, perform the following:
(Recommended) The Bayesian model can be run by invoking
python DSA_Bayesian.py -d <datafile>
from the terminal.
The maximum likelihood based DSA model can be run by invoking
python DSA.py -d <datafile>
from the terminal.
The semi-Bayesian Laplace approximation to the posterior distribution of the parameters can be carried out by running the following command
python DSA_Laplace.py -d <datafile>
from the terminal.
We provide two examples.
The first example extracts count data from a repository maintained by the New York Times. This example fits the Bayesian DSA model.
The second example works on a dummy data set and runs the basic DSA model.