Closed pzivich closed 5 years ago
Just a side thought related to this. Might consider calculating ALL available measures when TMLE
is fit. This would avoid the issue of having to re-specify the model each time (which could be time-intensive for complex ML). Might be better to dump all effect measures to the user (since computation time is small for computing all measures)
Missing data process based on R tmle
:
1) Estimate missing data model (something like missing_model()
optional argument)
2) Multiply g1W
and g0W
by corresponding p(missing=0)
3) Missing
needs to factor into influence curve calculation. ONLY the indicator though, not the weight itself
Help for missing data: https://www.jstatsoft.org/article/view/v051i13
Don't plan on adding C-TMLE anytime soon. The problem it is used to solve can be fixed via cross-fitting (to my current understanding. if that changes, will consider adding). C-TMLE purports to give the correct answer when both g- and Q-models are incorrect. I think this is an artifact of the Kang-Schafer data
As for now, these remaining additions are on the back-burner. If anything changes or users request, I will re-open
This is a longer term project. As I am reading through Targeted Learning, I will add to this list regarding features I would like to add. Also important notes that I have gleamed from the book.
[x] Add support for continuous outcomes (Targeted Learning Ch7 (pg 125, 126)
For a continuous Y, it must be bounded between 0-1 before starting the process. Transform using the following Y* = (Y-a) / (b-a) where a = min(Y), b = max(Y)
Psi = (b-a) Psi* to convert from the bounded causal effect back to the original
Use some alpha to keep logit(Y*) being undefined. alpha = 0.0005 maybe?
[x]
Add support for F(A=1) and F(A=0)Need to look up formulation in targeted learning book and ICNOT IMPLEMENTING. James Robins showed in 1988 (Confidence intervals for causal parameters) that the corresponding confidence intervals may only be valid assuming deterministic potential outcomes. As a result, I have decided to not implement this feature (since I don't want to make that assumption
[x] Natively handle missing data
R tmle will be a good reference for this
Add an option to specify a missing data model. This should be optional to include
[ ] Mediation analysis (direct, indirect, total) (Targeted Learning Ch 8)
Can add this, but I am not largely convinced of current mediation analysis. I know that people do like to use it though...
pg 139 has some good notes
[ ] Collaborative TMLE (CTMLE)
g-model should be based on TMLE of Q, not the fit to g. CTMLE is an approach to formalized this. Might try to add as an additional class object (depending on utility and substantive differences)