kathoffman / steroids-trial-emulation

Tutorial for a target trial emulation with a time-varying exposure, time-dependent confounding, time-to-event outcome, and Sequentially Doubly Robust estimation (Hoffman et al. 2022).
33 stars 13 forks source link
causal-inference tutorial

Steroids Target Trial Emulation Tutorial

This repository was created to help other analysts run analyses similar to Comparison of a Target Trial Emulation Framework to Cox Regression to Estimate the Effect of Corticosteroids on COVID-19 Mortality (Hoffman et al. 2022, JAMA Network Open).

This research was presented at the American Causal Inference Conference on May 24, 2022. The slide deck is available [here].

Code Contents

Demo Data

The primary analysis is run using the open source R package lmtp. Please note we use the sl3-compatible branch to improve computational speed. We provide demo data in the data folder in combination with this visual representation of the required data format:

The required data structure for a longitudinal time-to-event analysis is wide (one row per subject), with one column per time point per variable (treatment, censoring indicator, outcome indicator, time-varying covariate). The exception to this is baseline variables, which by definition do not have multiple time points.

A few notes to help with pre-processing:

Analysis Specifications

Super learner libraries

The code to make super learner libraries (via sl3) used in the paper's analysis is in analysis.R, however, all but LASSO and mean are commented out to improve computational time. Learners were the same for intervention and outcome mechanisms. We specified 10 folds for superlearner cross-validation. This is set to a value of .SL_folds=5 in our demo analysis code for computational time purposes.

Time-dependent confounding assumption

We used a Markov assumption of 2, meaning a patient's time-dependent confounders for the previous two time periods (48 hour windows) were sufficient to capture confounding for the next time point's mechanism. This was a decision stemming from clinical knowledge (laboratory results are ordered in 24 or 48 hour intervals). This is set to a value of k=1 in our demo analysis code for computational time purposes.

Cross-fitting

We employed 10-fold cross-fitting on our SDR estimator. This is set to a value of folds=5 in our demo analysis code for computational time purposes.

Figures

Figure 1: Hypothetical intervention

I've made this figure publicly available on a Google Slide deck [here]. Anyone is free to edit as they see fit for their own papers and educational materials. To edit this read-only slide, click File --> Save a copy and edit off your duplicated copy.

.

Figure 2: Directed acyclic graph (DAG)

I've made this figure publicly available on a Google Slide deck [here]. Anyone is free to edit as they see fit for their own papers and educational materials. To edit this read-only slide, click File --> Save a copy and edit off your duplicated copy.

.

Figure 3

Code to recreate this figure is in forest_plot_viz.R.

e-Figure 1: Treatment timelines

A figure in the Supplemental Materials shows a random sample of 50 patients' treatment timelines. A blog post to aid other analysts in creating their own treatment timelines can be found here.

.

e-Figure 2: Data analytic file

This figure (shown above) under Demo Data is publicly available on a Google Slide deck [here]. Anyone is free to edit as they see fit for their own papers and educational materials. To edit this read-only slide, click File --> Save a copy and edit off your duplicated copy.

References