osofr / simcausal

Simulating Longitudinal and Network Data with Causal Inference Applications
64 stars 11 forks source link

Mediation Analysis with simcausal #13

Open heledi opened 6 years ago

heledi commented 6 years ago

Hi, I'd like to use simcausal for mediation analysis but wonder how this can best be done. For natural direct and indirect effects I need to simulate the crossworld counterfactual Y(a, M(a)), - A is the binary exposure variable and M the mediator. M(a) is the level of the mediator that it would have naturally been under the reference intervention A=a*.

I'm also particularly interested in simulating randomized interventional analogues of the natural effects in the presence of exposure-induced confounding of the relationship of M and the outcome Y. These effects require the counterfactual Y(a, G(m|a, c)), with G(m|a, c) denoting a random draw from the distribution of M given A=a* and non-exposure-induced confounders C=c.

What I did so far was rather intuitive. For the interventional effects, I need to draw from the distribution G(m|a, c) - therefore I simulated data first and then specified a model (lm or glm,e.g.) for M|a, c for each counterfactual dataset a1 and a0. I used the estimated parameters for specifying G(m|a, c), G(m|a, c) and the respective outcome variables in new nodes and simulated the counterfactual dataset again to evaluate the interventional effects. Below is a simple example for illustration:

################################################################

Simple Mediation process with an exposure-induced confounder L

An "intuitive" approach

################################################################

set seed for reproducability

set.seed(41188)

M <- DAG.empty()

M <- M + node("a", # Binary exposure variable distr = "rbern", prob = 0.53) + node("c", # Confounder variable C distr="rnorm", mean=5, sd=2)+ node("l", #Binary confounder variable L -> exposure induced distr="rbern" , prob=ifelse(c>=5, 0.6-0.3a, 0.7-0.3a))+ node("m", #Mediator variable M distr = "rnorm", mean=10-2a+c+5l, sd=3)+ node("u.Y", #Latent variable distr = "rnorm", mean = 0, sd = 100)+ node("y", #Outcome Variable Y distr = "rnorm", mean = 2000-200a+10m-5am + 50l -20al+ 15c + u.Y, sd=200) Mset <- set.DAG(M, latent.v = c("u.Y")) str(Mset)

specify the two interventions

a1 <- node("a", distr = "rbern", prob = 1) Mset <- Mset + action("a1", nodes = a1) a0 <- node("a", distr = "rbern", prob = 0) Mset <- Mset + action("a0", nodes = a0)

counterfactual data to estimate conditional distribution M|a,c and M|a*,c

dat <- simcausal::sim(DAG = Mset, actions = c("a1", "a0"), n = 10000000, rndseed = 345)

look at counterfactual data

head(dat[["a1"]]); head(dat[["a0"]])

plotDAG(Mset, xjitter = 0.2, yjitter = 0.2, edge_attrs = list(width = 0.5, arrow.width = 0.5, arrow.size = 0.3), vertex_attrs = list(size = 20, label.cex = 0.8))

Evaluate distribution M|a,c and M|a*,c

summary(m0<-lm(m~c, data=dat[["a0"]])) sd.m0<-summary(m0)$sigma; sd.m0 coef.m0<-coefficients(m0); coef.m0

summary(m1<-lm(m~c, data=dat[["a1"]])) coef.m1<-coefficients(m1); coef.m1 sd.m1<-summary(m1)$sigma; sd.m1

M.r<- M + node("m.0", distr = "rnorm", mean = 13.7465169 + 0.9006255c, sd=3.828051)+ #insert values by hand node("m.1", distr = "rnorm", mean = 10.2433650 + 0.9009554 c, sd=3.826656) +

node("m.0", distr = "rnorm", mean = coef.m0[1]+coef.m0[2]*c, sd=sd.m0)+ this syntax doesn't work

node("m.1", distr = "rnorm", mean =coef.m1[1] + coef.m1[2]*c, sd=sd.m1)+

node("y.m0", distr = "rnorm", mean = 2000-200a+10m.0-5am.0 + 50l -20al+ 15c + u.Y, sd=200)+ node("y.m1", distr = "rnorm", mean = 2000-200a+10m.1-5am.1 + 50l -20al+ 15c + u.Y, sd=200) Mset.r <- set.DAG(M.r, latent.v = c("u.Y")) str(Mset.r)

specify the two interventions

a1 <- node("a", distr = "rbern", prob = 1) Mset.r <- Mset.r + action("a1", nodes = a1) a0 <- node("a", distr = "rbern", prob = 0) Mset.r <- Mset.r + action("a0", nodes = a0)

##################

Evaluate TRUTH

##################

Simulate counterfactual dataset

dat <- simcausal::sim(DAG = Mset.r, actions = c("a1", "a0"), n = 10000000, rndseed = 345)

Evaluate true randomized interventional analogue for the direct effect

Mset.r <- set.targetE(Mset.r, outcome = "y.m0", param = "a1") y_r.m0_a1<-eval.target(Mset.r, data = dat)$res ; y_r.m0_a1

Mset.r <- set.targetE(Mset.r, outcome = "y.m0", param = "a0") y_r.m0_a0<-eval.target(Mset.r, data = dat)$res; y_r.m0_a0 RIA.NDE<-y_r.m0_a1-y_r.m0_a0; RIA.NDE

Evaluate true randomized interventional analogue for the indirect effect

Mset.r <- set.targetE(Mset.r, outcome = "y.m1", param = "a1") y_r.m1_a1<-eval.target(Mset.r, data = dat)$res; y_r.m1_a1

RIA.NIE<-y_r.m1_a1-y_r.m0_a1; RIA.NIE

Evaluate randomized interventional analogue for total effect:

RIA.TE<-y_r.m1_a1-y_r.m0_a0 RIA.TE == RIA.NIE + RIA.NDE ###################################

I'm not quite sure, if this code is ok or if there are more efficient solutions? Thanks a lot for suggestions!