pzivich / zEpid

Epidemiology analysis package
http://zepid.readthedocs.org
MIT License
141 stars 33 forks source link

G-estimation of Structural Nested Models #27

Closed pzivich closed 5 years ago

pzivich commented 6 years ago

Add SNM to the zepid.causal branch. After this addition, all of Robin's g-methods will be implemented in zEpid.

SNM are discussed in the Causal Inference book (https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/) and The Chapter. SAS code for search-based and closed-form solvers is available at the site. Ideally will have both implemented. Will start with time-fixed estimator

pzivich commented 5 years ago

After my biostat course (or during it), I will plan on trying to implement g-estimation of SNM. Basically, the estimator is computational intensive since you need to search through potential values. You have to explore many points in a parameter space (fine-grained enough). For interactions, this becomes a a grid search problem. As of now, optimization is beyond me. Plans are to get a working g-estimation of SNM for simple scenarios. As some reference points, the following R package might be useful to compare (and pull from)

https://oce.ovid.com/article/00001648-201703000-00029/HTML

https://cran.r-project.org/web/packages/DTRreg/DTRreg.pdf

Reference to argue / compare to marginal structural models

https://cdn1.sph.harvard.edu/wp-content/uploads/sites/343/2013/03/msm-cie-fnl.pdf

pzivich commented 5 years ago

Looks like statsmodels GEE will need to be used. I will also need to minimize \psi through something like scipy. Unfortunately, going to be somewhat slow since I have to use GEE which takes longer to fit than GLM. However, the weights option only works for GEE for what I want it to do.

Not ideal overall since g-estimation needs some heavy lifting. It is likely to be horrendously slow... Might be worth toying around with sklearn to compare time. Once I implement DCDR, I will need sklearn as a dependency

pzivich commented 5 years ago

According to Technical 14.1 in HR, a multiplicative model can be used when Y is positive. For a binary Y, it can be used when Pr(Y=1) is small in all strata of L. These don't generalize to time-varying treatments

As a result of this, I plan on only implementing g-estimation for additive SNM. I think this is justified on several fronts. First is the above issues listed. Second, additive scale interaction/modification is more meaningful for public health (and other fields)

pzivich commented 5 years ago

There are some interesting potential sensitivity analyses for g-estimation (fine point 14.2). Basically instead of alpha = 0, you can assess unmeasured confounding by allowing alpha =/= 0. For example, it could be be alpha = 0.1

Source for further details on sensitivity analyses https://www.jstor.org/stable/2669923?seq=1#metadata_info_tab_contents

EDIT: I think I will only be able to do this for the optimization algorithm. Basically, you would add a minus term to the number to minimize. Should be easy to do. I don't know how I would do this for the closed form solution.

pzivich commented 5 years ago

Scratch like everything I said regarding binary Y's. I had misunderstood. In actuality, estimating Risk Ratios is more difficult, not Risk Differences. G-estimation will support but ATE and RD. For now, I don't plan on adding support for risk ratios (for several reasons, including implementation difficulty and some assumptions required for SNM)

pzivich commented 5 years ago

Closed via #96