remove intercept column if super learning

benkeser commented 6 years ago

If SuperLearner is used to estimate nuisance parameters, ltmle first creates a design matrix based on Qform and gform that is subsequently passed to SuperLearner via the X option.

However, the default formulas for Qform and gform generate an intercept column in the X matrix. This causes some algorithms in SuperLearner to throw unnecessary warnings (e.g., the call to glm in SL.glm complains about not having a full rank matrix). Generally, these algorithms will still work, but the ltmle output will include unnecessary warnings.

This PR fixes this by checking for an intercept term when model.matrix is called and if there is one, it removes the first column of X, which is assumed to the be the intercept.

I haven't done extensive testing to know if there is anywhere else in the code where this change is necessary, but a few examples I've run seem to work with warnings.

benkeser commented 6 years ago

Commit 41a22a1546427fd6934062502484baa24306f3e6 fixes a bug in how p-values are generated for test of counterfactual means.

Based on the previous code, I'd guess that the goal was to test null hypothesis that E[Y(a)] = 0. However, what was instead being tested when outcomes were transformed was (E[Y(a)] - min(Y)) / (max(Y) - min(Y)) = 0.

I've corrected the summary.ltmleEffectMeasures function to reflect this.

benkeser commented 6 years ago

Commit 83b100bfafb831a31d7e504574bcde5f72170c31 fixes bug in summary.ltmle induced by 41a22a1546427fd6934062502484baa24306f3e6.

Also adds stability checks to SuperLearner. Specifically,

For binary outcomes, if fewer than 10 outcomes, change SuperLearner to V=2 fold CV and stratify on outcomes. I was running into cases where there were e.g., 9 outcomes and all the SuperLearner wrappers were complaining about lack of convergence (presumably because all outcomes were 0 in some folds).
If only 1 outcome, change SL.library to SL.mean, since no regression technique can really do anything anyway.

joshuaschwab / ltmle

remove intercept column if super learning #18