Closed pzivich closed 5 years ago
When I release this version, it also removed TimeVaryGFormula
as mentioned in #70 whether this is part of v0.6.0 or earlier (likely going to be a 0.5.x release)
Before adding the censoring model, I am going to see if I can speed the g-formula up a bit. I am going to manually run patsy
myself to pull the matrix then multiply with numpy
directly.
As the benchmark, the g-formula in d4a3f2c takes about 130-140 seconds in the test set-up
Interacting with the matrix via patsy
and the estimated betas, doesn't save me any time really. Still about 130-140 seconds. The item to target for speed improvement in this approach would be np.dot
Storing a function for later. If I find a faster dot product, I can attempt to switch statmodels predict
to the new implementation
There are some difficulting getting the competing risk g-formula to work (np.random.multinomial
doesn't like an array of probabilities). I also don't have a good comparison data set currently. I built some of the code structure for competing risks. I will add this into a later release (once I resolve some of these issues)
As noted in #73 and #77 there are some further optional enhancements I can add to
MonteCarloGFormula
Items to add:
[x] Censoring model
[ ] Competing risk model
Testing:
[x] Test censoring model works as intended (compare to Keil 2014)
[ ] Test competing risks. May be easiest to simulate up a quick data set to compare. Don't have anything on hand
The updates to Monte-Carlo g-formula will be added to a future update (haven't decided which version they will make it into)
Optional:
I sometimes run into a
MemoryError
when replicating Keil et al 2014 with many resamples. A potential way out of this is to "throw away" the observations that are not the final observation for that individual. Can add optionlow_memory=True
to throw out those unnecessary observations. User could return the full simulated dataframe withFalse
.