pzivich / zEpid

Epidemiology analysis package
http://zepid.readthedocs.org
MIT License
141 stars 33 forks source link

Enhancements to Monte-Carlo g-formula #78

Closed pzivich closed 5 years ago

pzivich commented 5 years ago

As noted in #73 and #77 there are some further optional enhancements I can add to MonteCarloGFormula

Items to add:

Testing:

The updates to Monte-Carlo g-formula will be added to a future update (haven't decided which version they will make it into)

Optional:

I sometimes run into a MemoryError when replicating Keil et al 2014 with many resamples. A potential way out of this is to "throw away" the observations that are not the final observation for that individual. Can add option low_memory=True to throw out those unnecessary observations. User could return the full simulated dataframe with False.

pzivich commented 5 years ago

When I release this version, it also removed TimeVaryGFormula as mentioned in #70 whether this is part of v0.6.0 or earlier (likely going to be a 0.5.x release)

pzivich commented 5 years ago

Before adding the censoring model, I am going to see if I can speed the g-formula up a bit. I am going to manually run patsy myself to pull the matrix then multiply with numpy directly.

As the benchmark, the g-formula in d4a3f2c takes about 130-140 seconds in the test set-up

pzivich commented 5 years ago

Interacting with the matrix via patsy and the estimated betas, doesn't save me any time really. Still about 130-140 seconds. The item to target for speed improvement in this approach would be np.dot

Storing a function for later. If I find a faster dot product, I can attempt to switch statmodels predict to the new implementation

pzivich commented 5 years ago

There are some difficulting getting the competing risk g-formula to work (np.random.multinomial doesn't like an array of probabilities). I also don't have a good comparison data set currently. I built some of the code structure for competing risks. I will add this into a later release (once I resolve some of these issues)