TMLE additions - Githubissues

pzivich commented 5 years ago

This is a longer term project. As I am reading through Targeted Learning, I will add to this list regarding features I would like to add. Also important notes that I have gleamed from the book.

[x] Add support for continuous outcomes (Targeted Learning Ch7 (pg 125, 126)
For a continuous Y, it must be bounded between 0-1 before starting the process. Transform using the following Y* = (Y-a) / (b-a) where a = min(Y), b = max(Y)
Psi = (b-a) Psi* to convert from the bounded causal effect back to the original
Use some alpha to keep logit(Y*) being undefined. alpha = 0.0005 maybe?
[x] ~~Add support for F(A=1) and F(A=0)~~
~~Need to look up formulation in targeted learning book and IC~~
NOT IMPLEMENTING. James Robins showed in 1988 (Confidence intervals for causal parameters) that the corresponding confidence intervals may only be valid assuming deterministic potential outcomes. As a result, I have decided to not implement this feature (since I don't want to make that assumption
[x] Natively handle missing data
R tmle will be a good reference for this
Add an option to specify a missing data model. This should be optional to include
[ ] Mediation analysis (direct, indirect, total) (Targeted Learning Ch 8)
Can add this, but I am not largely convinced of current mediation analysis. I know that people do like to use it though...
pg 139 has some good notes
[ ] Collaborative TMLE (CTMLE)
g-model should be based on TMLE of Q, not the fit to g. CTMLE is an approach to formalized this. Might try to add as an additional class object (depending on utility and substantive differences)

pzivich commented 5 years ago

Just a side thought related to this. Might consider calculating ALL available measures when TMLE is fit. This would avoid the issue of having to re-specify the model each time (which could be time-intensive for complex ML). Might be better to dump all effect measures to the user (since computation time is small for computing all measures)

pzivich commented 5 years ago

Missing data process based on R tmle:

1) Estimate missing data model (something like missing_model() optional argument)

2) Multiply g1W and g0W by corresponding p(missing=0)

3) Missing needs to factor into influence curve calculation. ONLY the indicator though, not the weight itself

Help for missing data: https://www.jstatsoft.org/article/view/v051i13

pzivich commented 5 years ago

Don't plan on adding C-TMLE anytime soon. The problem it is used to solve can be fixed via cross-fitting (to my current understanding. if that changes, will consider adding). C-TMLE purports to give the correct answer when both g- and Q-models are incorrect. I think this is an artifact of the Kang-Schafer data

As for now, these remaining additions are on the back-burner. If anything changes or users request, I will re-open

pzivich / zEpid

TMLE additions #39