pzivich / Delicatessen

Delicatessen: the Python one-stop sandwich (variance) shop 🥪
https://deli.readthedocs.io/en/latest/index.html
MIT License
22 stars 2 forks source link

Pooled logistic for survival analyses #42

Open pzivich opened 7 months ago

pzivich commented 7 months ago

Is your feature request related to a problem? Please describe.

Add an estimating equation for pooled (logistic) regression to support survival analysis operations. This is a finite-dimension M-estimator, so standard theory would apply. This also opens up various survival analysis options, like computing IPCW, g-computation, and others.

Describe the solution you'd like

Build an estimating equation for pooled logistic regression. Note that it would not require a long data set. Specifically, we should evaluate something like the following $$\sum{i=1}^n \left( \sum{k \in R} (\Delta_i t_k - m(W_i, S_i; \beta)) \left[ W_i, S_i \right]^T \right) = 0$$ this makes a compact estimating equation which avoids the expansion into a long data set. This avoids mistakes potentially introduced in data processing steps (for the users). This is the advantage of working with the score! However, it requires some finesse to specify the estimating equation programmatically. Particularly, the design matrix for time (i.e., $S$) which is dependent on $k$.

Challenges here:

Describe alternatives you've considered

Code from scratch each time (I would rather not, and would be good support for users).

Additional context

Abbott, R. D. (1985). Logistic regression in survival analysis. American Journal of Epidemiology, 121(3), 465-471.

D'Agostino, R. B., Lee, M. L., Belanger, A. J., Cupples, L. A., Anderson, K., & Kannel, W. B. (1990). Relation of pooled logistic regression to time dependent Cox regression analysis: the Framingham Heart Study. Statistics in Medicine, 9(12), 1501-1515.

Hernán, M. A. (2010). The hazards of hazard ratios. Epidemiology, 21(1), 13-15.

Ngwa, J. S., Cabral, H. J., Cheng, D. M., Pencina, M. J., Gagnon, D. R., LaValley, M. P., & Cupples, L. A. (2016). A comparison of time dependent Cox regression, pooled logistic regression and cross sectional pooling with simulations and an application to the Framingham Heart Study. BMC Medical Research Methodology, 16, 1-12.