Enhancements to Monte-Carlo g-formula

pzivich commented 5 years ago

As noted in #73 and #77 there are some further optional enhancements I can add to MonteCarloGFormula

Items to add:

[x] Censoring model
[ ] Competing risk model

Testing:

[x] Test censoring model works as intended (compare to Keil 2014)
[ ] Test competing risks. May be easiest to simulate up a quick data set to compare. Don't have anything on hand

The updates to Monte-Carlo g-formula will be added to a future update (haven't decided which version they will make it into)

Optional:

[x] Reduce memory burden of unneeded replicants

I sometimes run into a MemoryError when replicating Keil et al 2014 with many resamples. A potential way out of this is to "throw away" the observations that are not the final observation for that individual. Can add option low_memory=True to throw out those unnecessary observations. User could return the full simulated dataframe with False.

pzivich commented 5 years ago

When I release this version, it also removed TimeVaryGFormula as mentioned in #70 ~~whether this is part of v0.6.0 or earlier (likely going to be a 0.5.x release)~~

pzivich commented 5 years ago

Before adding the censoring model, I am going to see if I can speed the g-formula up a bit. I am going to manually run patsy myself to pull the matrix then multiply with numpy directly.

As the benchmark, the g-formula in d4a3f2c takes about 130-140 seconds in the test set-up

pzivich commented 5 years ago

Interacting with the matrix via patsy and the estimated betas, doesn't save me any time really. Still about 130-140 seconds. The item to target for speed improvement in this approach would be np.dot

Storing a function for later. If I find a faster dot product, I can attempt to switch statmodels predict to the new implementation

pzivich commented 5 years ago

There are some difficulting getting the competing risk g-formula to work (np.random.multinomial doesn't like an array of probabilities). I also don't have a good comparison data set currently. I built some of the code structure for competing risks. I will add this into a later release (once I resolve some of these issues)

pzivich / zEpid

Enhancements to Monte-Carlo g-formula #78