osofr / simcausal

Simulating Longitudinal and Network Data with Causal Inference Applications
64 stars 11 forks source link

Defining an deterministic action #4

Closed mizano924 closed 8 years ago

mizano924 commented 8 years ago

I am trying to specify an action that assigns a value of "abar" to E[t] while a time-varying covariate C[t-1] is 1, and 0 otherwise. The DAG is defined as follows:

tmax<-10
S1.E.1 <-substitute(plogis(-0.30 + 1.00*W1 + 0.08*W2))
S1.E   <-substitute(plogis(-0.30 + 1*E[t-1]))
S1.D.1 <-substitute(plogis(-2.50 - 0.10*W1 + 0.20*W2 + 0.07*E[t]))
S1.D   <-substitute(plogis(-2.50 - 0.10*W1 + 0.20*W2 + 0.07*E[t]))
S1.Y.1 <-substitute(plogis(-2.50 - 0.50*W1 + 0.50*W2 + 0.10*E[t] + 1.00*S + 0.20*S*E[t]))
S1.Y   <-substitute(plogis(-2.50 - 0.05*W1 - 0.25*W2 + 0.30*E[t] + 0.10*Ebar[t-1] + 2.00*H[t-1] + 2.00*S + 0.20*S*Ebar[t-1]))
S1.H.1 <-substitute(plogis(-3.00 + 0.10*W1 + 0.50*W2 + 2.00*S + 0.25*S*E[t]))
S1.H   <-substitute(plogis(-3.00 + 0.10*W1 + 0.50*W2 + 2.00*S + 0.25*S*E[t] + 0.50*Ebar[t-1]))
S1.C.1 <-substitute(plogis( 4.00 - 0.05*W1 - 0.25*W2)) #staying at work
S1.C   <-S1.C.1
D1 <- DAG.empty() +
  node("S",      distr = "rconst", const = 0)+
  node("TTH",    distr = "runif",  min = -2, max = 10)+
  node("C.star", distr = "rconst", const = floor(TTH))+
  node("W1",     distr = "rbern",  prob  = 0.20) +
  node("W2",     distr = "rbern",  prob  = 0.20) +

  node("E",      t=1,       distr = "rbern",  prob=.(S1.E.1))+
  node("D",      t=1,       distr = "rbern",  prob=.(S1.D.1),EFU=TRUE)+
  node("Y",      t=1,       distr = "rbern",  prob=.(S1.Y.1), EFU=TRUE)+
  node("Ebar",   t=1,       distr = "rconst", const=E[1])+
  node("H",      t=1,       distr = "rbern",  prob=.(S1.H.1))+
  node("C",      t=1,       distr = "rbern",  prob=.(S1.C.1))+

  node("E",      t=2:tmax,  distr="rbern",    prob=ifelse(C[t-1]==0,0,.(S1.E)))+
  node("D",      t=2:tmax,  distr="rbern",    prob=.(S1.D), EFU=TRUE)+
  node("Y",      t=2:tmax,  distr="rbern",    prob=.(S1.Y), EFU=TRUE)+ 
  node("Ebar",   t=2:tmax,  distr="rconst",   const=sum(E[1:t])) + 
  node("H",      t=2:tmax,  distr="rbern",    prob=ifelse(H[t-1]==1,1,.(S1.H)))+
  node("C",      t=2:tmax,  distr="rbern",    prob=ifelse(C[t-1]==0,0,.(S1.C)))
RCDAG1 <- set.DAG(D1)

I tried to generate an observed dataset as follows:

K <- 10
act_dynAD0 <-c(
  node("E", t = 1:K, distr = "rconst", const = ifelse(t==1,abar,ifelse(C[t-1]==1, abar, 0))),
  node("D", t = 1:K, distr = "rconst", const = 0)) 
DynDact   <-  RCDAG1 + action("DynE0D0", nodes = act_dynAD0, abar = 0) + action("DynE1D0", nodes = act_dynAD0, abar = 1)
dt    <-  sim(DynDact,actions = c("DynE0D0","DynE1D0"), n = 1000, rndseed = 4)

And obtained the following warning:

Warning messages:
1: In rbinom(n = n, prob = prob, size = 1) : NAs produced
2: In rbinom(n = n, prob = prob, size = 1) : NAs produced
3: In rbinom(n = n, prob = prob, size = 1) : NAs produced
4: In rbinom(n = n, prob = prob, size = 1) : NAs produced
5: In rbinom(n = n, prob = prob, size = 1) : NAs produced

Many of the E[t] columns contain NAs. Would appreciate your help in resolving this.

osofr commented 8 years ago

Quick answer

This is more of an issue of R function ifelse which determines its output length by the length of its conditional statement (expression t==1 has length 1, while the expression C[t-1]==1 has length n). This is also an issue of inconsistent interface, since C is internally stored as a vector of length n, while t is stored internally as an integer of length 1. However, replacing such problematic ifelse(...) calls with statements {if(condition) {do1} else {do2}} is a possible work-around and will make the above code run as intended:

K <- 10
act_dynAD0 <-c(
  node("E", t = 1:K, distr = "rconst", const = {if(t==1) {abar} else {ifelse(C[t-1]==1, abar, 0)}} ),
  node("D", t = 1:K, distr = "rconst", const = 0)) 
DynDact   <-  RCDAG1 + action("DynE0D0", nodes = act_dynAD0, abar = 0) + action("DynE1D0", nodes = act_dynAD0, abar = 1)
dt    <-  sim(DynDact,actions = c("DynE0D0","DynE1D0"), n = 1000, rndseed = 4)

Additional details

The line

node("E", t = 1:K, distr = "rconst", const = ifelse(t==1,abar,ifelse(C[t-1]==1, abar, 0))),

involves a call to a vectorized function ifelse(condition, do1, do2), where the conditional statement is t==1, which is a scalar of length 1. This ifesle expression will thus evaluate only for the first observation and it will be a constant result for all N observation. The next condition C[t-1]==1 will be ignored for all observations, but the very first one.

Currently there is no easy way to modify the package to make this interface work. One possible fix is to explicitly catch ifelse calls which have t as part of the conditional statement. Such an expression parser could then replace each instance of scalar t with a vector rep.int(t,n). Changing the internal representation of t to a vector of length n is not a viable option, since it would break all of the interface for formulas Var[t].