Source code for Casebase paper
Source code for casebase paper. Install the following packages from GitHub

# install.packages("devtools")

devtools::install_github("rstudio/rticles") # for template
devtools::install_github("benmarwick/rrtools") # for building the research compendium/docker container

To contribute

  1. Modify the analysis/paper/paper.Rmd file.
  2. Run rrtools::add_dependencies_to_description(). This command will ensure all used packages are added to the DESCRIPTION file.
  3. Compile the paper.Rmd file
  4. Commit and push your changes to GitHub

Docker Image

# which explains the run tags
docker pull sahirbhatnagar/cbpaper:latest # pulls the image locally
docker images # see list of images
docker ps -a # also see list of images
docker run -d -p 8787:8787 -e PASSWORD=<YOUR_PASS> --name cbpaper sahirbhatnagar/cbpaper
# then go to http://localhost:8787
# username is rstudio, password is what you specified
# in R do: setwd('/cbpaper/') and then you should see the folder with all the materials in the folder RStudio pane
docker stop cbpaper # this can be what you supplied to --name in the above command or the container ID

File structure of repo

├── paper/
│   ├── paper.Rmd       # this is the main document to edit. just testing code for now
│   └── references.bib  # this contains the reference list information
├── figures/            # location of the figures produced by the Rmd
├── data/
    ├── raw_data/       # data obtained from elsewhere
    └── derived_data/   # data generated during the analysis


  1. Introduction
  2. Theory
  3. Implementation Details (Population time plots, Data analysis)
  4. Case study 1 ERPSC (Single Event)
  5. Case study 2 Bone Marrow Transplant (Competing risk)
  6. Variable Selection (see for HD survival Data)
  7. Discussion

To-DO (July 15)

1) Max: implement multinomial glmnet 2) Jesse: tests single and competing risk variable selection on TCGA. Look at variable selection litterature for competing risks. Find data 3) Sahir: Implementation details (population time plots) 4) Sahir: Review existing literature. What exists in terms of package. Look at Hanley and Miettinen. CRAN task view.

To-DO (July 22)

1) Max: theory text 2) Jesse: tests single event variable selection on TCGA. Look at variable selection litterature for single. Plot KM curve with casebase+glmnet, and glmnet+cox 3) Sahir: Implementation details (population time plots), check issue. 4) Sahir: Review existing literature. What exists in terms of package. Look at Hanley and Miettinen. CRAN task view.

Package on CRAN documentation Published Description Function call
flexsurv :heavy_check_mark: Vignette Jackson, C. JSS 2016 Fully-parametric. Any parametric time-to-event distribution may be fitted if the user supplies a probability density or hazard function, and ideally also their cumulative versions. Standard survival distributions are built in, including the three and four-parameter generalized gamma and Fdistributions. Any parameter of any distribution can be modelled as a linear or log-linear function of covariates. The package also includes the spline model of Royston and Parmar (2002), in which both baseline survival and covariate effects can be arbitrarily flexible parametric functions of time. See Table 1 for full list of distributions. flexsurvreg(Surv(recyrs, censrec) ~ group, data = bc, dist = "gengamma") flexsurvspline(Surv(recyrs, censrec) ~ group, data = bc, k = 1, scale = "odds")
