Hi, I'd like to use simcausal for mediation analysis but wonder how this can best be done. For natural direct and indirect effects I need to simulate the crossworld counterfactual Y(a, M(a)), - A is the binary exposure variable and M the mediator. M(a) is the level of the mediator that it would have naturally been under the reference intervention A=a*.

I'm also particularly interested in simulating randomized interventional analogues of the natural effects in the presence of exposure-induced confounding of the relationship of M and the outcome Y. These effects require the counterfactual Y(a, G(m|a, c)), with G(m|a, c) denoting a random draw from the distribution of M given A=a* and non-exposure-induced confounders C=c.

What I did so far was rather intuitive. For the interventional effects, I need to draw from the distribution G(m|a, c) - therefore I simulated data first and then specified a model (lm or glm,e.g.) for M|a, c for each counterfactual dataset a1 and a0. I used the estimated parameters for specifying G(m|a, c), G(m|a, c) and the respective outcome variables in new nodes and simulated the counterfactual dataset again to evaluate the interventional effects. Below is a simple example for illustration:

################################################################

Simple Mediation process with an exposure-induced confounder L

An "intuitive" approach

################################################################

set seed for reproducability

set.seed(41188)

M <- DAG.empty()

M <- M + node("a", # Binary exposure variable distr = "rbern", prob = 0.53) + node("c", # Confounder variable C distr="rnorm", mean=5, sd=2)+ node("l", #Binary confounder variable L -> exposure induced distr="rbern" , prob=ifelse(c>=5, 0.6-0.3a, 0.7-0.3a))+ node("m", #Mediator variable M distr = "rnorm", mean=10-2a+c+5l, sd=3)+ node("u.Y", #Latent variable distr = "rnorm", mean = 0, sd = 100)+ node("y", #Outcome Variable Y distr = "rnorm", mean = 2000-200a+10m-5am + 50l -20al+ 15c + u.Y, sd=200) Mset <- set.DAG(M, latent.v = c("u.Y")) str(Mset)

specify the two interventions

a1 <- node("a", distr = "rbern", prob = 1) Mset <- Mset + action("a1", nodes = a1) a0 <- node("a", distr = "rbern", prob = 0) Mset <- Mset + action("a0", nodes = a0)

counterfactual data to estimate conditional distribution M|a,c and M|a*,c

dat <- simcausal::sim(DAG = Mset, actions = c("a1", "a0"), n = 10000000, rndseed = 345)

look at counterfactual data

head(dat[["a1"]]); head(dat[["a0"]])

plotDAG(Mset, xjitter = 0.2, yjitter = 0.2, edge_attrs = list(width = 0.5, arrow.width = 0.5, arrow.size = 0.3), vertex_attrs = list(size = 20, label.cex = 0.8))

Evaluate distribution M|a,c and M|a*,c

summary(m0<-lm(m~c, data=dat[["a0"]])) sd.m0<-summary(m0)$sigma; sd.m0 coef.m0<-coefficients(m0); coef.m0

summary(m1<-lm(m~c, data=dat[["a1"]])) coef.m1<-coefficients(m1); coef.m1 sd.m1<-summary(m1)$sigma; sd.m1

M.r<- M + node("m.0", distr = "rnorm", mean = 13.7465169 + 0.9006255c, sd=3.828051)+ #insert values by hand node("m.1", distr = "rnorm", mean = 10.2433650 + 0.9009554 c, sd=3.826656) +

node("m.0", distr = "rnorm", mean = coef.m0[1]+coef.m0[2]*c, sd=sd.m0)+ this syntax doesn't work

node("m.1", distr = "rnorm", mean =coef.m1[1] + coef.m1[2]*c, sd=sd.m1)+

node("y.m0", distr = "rnorm", mean = 2000-200a+10m.0-5am.0 + 50l -20al+ 15c + u.Y, sd=200)+ node("y.m1", distr = "rnorm", mean = 2000-200a+10m.1-5am.1 + 50l -20al+ 15c + u.Y, sd=200) Mset.r <- set.DAG(M.r, latent.v = c("u.Y")) str(Mset.r)

specify the two interventions

a1 <- node("a", distr = "rbern", prob = 1) Mset.r <- Mset.r + action("a1", nodes = a1) a0 <- node("a", distr = "rbern", prob = 0) Mset.r <- Mset.r + action("a0", nodes = a0)

##################

Evaluate TRUTH

##################

Simulate counterfactual dataset

dat <- simcausal::sim(DAG = Mset.r, actions = c("a1", "a0"), n = 10000000, rndseed = 345)

Evaluate true randomized interventional analogue for the direct effect

Mset.r <- set.targetE(Mset.r, outcome = "y.m0", param = "a1") y_r.m0_a1<-eval.target(Mset.r, data = dat)$res ; y_r.m0_a1

Mset.r <- set.targetE(Mset.r, outcome = "y.m0", param = "a0") y_r.m0_a0<-eval.target(Mset.r, data = dat)$res; y_r.m0_a0 RIA.NDE<-y_r.m0_a1-y_r.m0_a0; RIA.NDE

Evaluate true randomized interventional analogue for the indirect effect

Mset.r <- set.targetE(Mset.r, outcome = "y.m1", param = "a1") y_r.m1_a1<-eval.target(Mset.r, data = dat)$res; y_r.m1_a1

RIA.NIE<-y_r.m1_a1-y_r.m0_a1; RIA.NIE

Evaluate randomized interventional analogue for total effect:

RIA.TE<-y_r.m1_a1-y_r.m0_a0 RIA.TE == RIA.NIE + RIA.NDE ###################################

I'm not quite sure, if this code is ok or if there are more efficient solutions? Thanks a lot for suggestions!

osofr / simcausal

Mediation Analysis with simcausal #13

Simple Mediation process with an exposure-induced confounder L

An "intuitive" approach

set seed for reproducability

specify the two interventions

counterfactual data to estimate conditional distribution M|a,c and M|a*,c

look at counterfactual data

Evaluate distribution M|a,c and M|a*,c

node("m.0", distr = "rnorm", mean = coef.m0[1]+coef.m0[2]*c, sd=sd.m0)+ this syntax doesn't work

node("m.1", distr = "rnorm", mean =coef.m1[1] + coef.m1[2]*c, sd=sd.m1)+

specify the two interventions

Evaluate TRUTH

Simulate counterfactual dataset

Evaluate true randomized interventional analogue for the direct effect

Evaluate true randomized interventional analogue for the indirect effect

Evaluate randomized interventional analogue for total effect: