Closed pzivich closed 5 years ago
Important clarifications for the estimation procedure of sequential regression
1) It will be easier to convert the dataframe from long to wide in the background for estimation.
2) Some other important variables to generate; indicator if followed treatment, indicator if died under intervention plan
3) Estimation works like the following;
a) at Q_t fit a regression model to those who survived till T=t.
b) Predict Yt based on that model for the intervention of interest
c) For those who followed the treatment plan AND had the outcome they have a 1 carried forward
d) all else who did NOT have the outcome, are considered censored (np.nan
)
4) Above process is repeated. For those WITH predicted outcomes, their predicted outcome is used in the model fitting. Those who were observed at Q{t-1} but censored at Q_t have their observed outcome used
Regarding LTMLE in #19 5) Influence curve confidence interval formulas can be found in the "Targeted Learning" textbook. Options will be risk, risk difference, risk ratio 6) Estimate IPW prior to converting from long-to-wide for easier estimation. Maybe not though since might want to make less parametric assumptions about time. Need to think about this in implementation
Plan is to get sequential regression estimator working for the g-formula. After it works, should be easy transition to LTMLE. For LTMLE, can estimate the exposure weights before the conversion from long-to-wide for ease.
Machine learner may be more complicated to implement than the standard TMLE. This would be due to the long-to-wide conversion. Have to think more deeply about this
Have a theoretically working estimator in Gformula_Sequential
. The code was generalized from some other code I wrote for sequential regression estimator. Still need to add custom treatment support. Also need to find some data to test/valid/compare.
Currently, use the Monte Carlo estimation procedure for the g-formula. An alternative is to use sequential regression. Sequential regression has the advantage of needing fewer models to be specified. Krief et al. 2017 has a nice description of sequential regression estimation. This is the same paper that describes LTMLE