nhejazi / txshift

:package: :game_die: R/txshift: Efficient Estimation of the Causal Effects of Stochastic Interventions, with Corrections for Outcome-Dependent Sampling
https://codex.nimahejazi.org/txshift
Other
13 stars 4 forks source link

Names for variables in IPCW regression #13

Closed benkeser closed 6 years ago

benkeser commented 6 years ago

It seems like the function to estimate \Pi_0 requires a formula for glm to be specified using V1, V2, etc... It would be more helpful if the user could specify in terms of colnames(W) and Y.

nhejazi commented 6 years ago

Having written that bit of the code, I am suffering from extreme tunnel vision so I'm likely missing something here, but how is this different from specifying whatever covariates you want to use in the estimation of \Pi_0 in the argument V?

In keeping with the notation from Mark and Sherri's paper, V is the set of all covariates used in estimating the censoring mechanism, so the regression performed by est_ipcw should always be of the form Delta ~ ., where . is simply all of V. To remove a variable, one should simply drop it from the node V. Is there some reason we want, say, V = c(W, Y) but IPCW reg = Delta ~ W? If not then I think we should just remove the equation for the GLM in the IPCW regression part.

benkeser commented 6 years ago

It's about how the covariates are named. In the call to glm for estimating \Pi_0, the data passed to glm is a data frame with columns named V1, V2, etc... So the user would have to know what columns of W map to V1, V2, etc...

nhejazi commented 6 years ago

Resolved by #16