statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
9.73k stars 2.84k forks source link

SUMM: prediction results (outside tsa) #2150

Open josef-pkt opened 9 years ago

josef-pkt commented 9 years ago

Bringing together notes for different cases of prediction results. This does not include conditional forecasting as in TSA, but includes non-linear and non-normal models.

example, simplest case OLS: predict mean, confidence interval for predicted mean, confidence interval for predicting a new observation, see #719 extends to WLS with weights or estimated variance function HetGLS or (non-existing) HetMLE.

issue for GLM #938

models, cases

problem: convolution

We have two sources of uncertainty, one coming from parameter uncertainty, one coming from the assumed distribution of endog or residual. It will be relatively straightforward to treat them separately, but the convolution is difficult to compute except in the simple normal model with additive error.

This will cause a difference in the implementation/API pattern between linear model and all other models, RegressionResults prediction interval for new observations is calculated as convolution, but we won't have it in other models. We can get it by MonteCarlo or Bootstrap for other models but that should be separate methods since they are not cheap. (I ran some experimental functions for this for the Poisson case.)

Implementation

for some interface comment see #719

In terms of internal implementation, confidence intervals for new observations are different, inference and confidence interval on mean prediction is the same as testing, e.g. t_test can calculate everything for predict_mean.

Stata has model specific predict and lincom, nlcom and test, testnl.

In terms of algorithm we have three cases for inference on function of parameters

(2) is the main tool for prediction in GLM and discrete, one-parameter LEF (if variance is estimated then it is still asymptotically uncorrelated with mean, block diagonal information matrix. I guess NB1 and BetaRegression won't fit into this.)

other

in-sample versus out-of-sample: the main difference here is that we have a corrected residual estimate, in-sample is currently mostly covered by influence_outliers

josef-pkt commented 9 years ago

Kerby's comment on extensions and requirements for prediction https://github.com/statsmodels/statsmodels/pull/2151#issuecomment-73418785

josef-pkt commented 9 years ago

One of Kerby's comments

" Multiple testing: this is basically a convenience method for doing lots of tests/intervals, so simultaneity issues are relevant. Someone could just take the p-values and feed them through a multitest procedure. But since the tests/intervals all come from the same model they are likely to be quite correlated, so approaches like Scheffe (when applicable) should be less conservative. "

While looking at the difference between confidence interval for mean and confidence interval for observations, I think I figured out what was bothering me in #2172 about multiple testing and Scheffe:

In a parametric setting with a fixed finite number of parameters, the distribution of the mean predictions are perfectly correlated, they are all just functions of the same few parameters. So, I think we don't need any multiple testing correction for each prediction. If we want to predict observations, then each observations includes a separate noise, which often is independent across observations, and in that case we would have a largely or partially uncorrelated prediction/hypothesis for each additional observation.

example: I was thinking about adding a test to the prediction of a new observation in #2151 which would be similar to the outlier test for in-sample observations where we do need the multiple testing correction.

(I still haven't read the articles mentioned in #2172, but I was partially catching up on Scheffe in general for mutliple testing of many parameters.)

josef-pkt commented 9 years ago

and maybe I'm still wrong and need to "debug" my intuition: all pairwise comparison and similar have multiple testing correction by the number of comparisons and not by the number of underlying parameters.

josef-pkt commented 9 years ago

Ok, I was wrong. The analogy are to the supremum tests, where we check the worst case. If we don't reject the worst case, then we also don't reject any of the other cases. This limits the familywise type 1 error rate. I'm still a bit vague on this, and skipped the literature on gate-keeping multiple testing procedures.

josef-pkt commented 5 years ago

random find

https://www.reddit.com/r/datascience/comments/bu3kr3/from_academia_to_real_world_how_do_you_present/

"If you have a full model, I will often make hypothetical people and show how the predicted risk changes. Eg if we have two people one who is 50 and one who is 65 (and the same on all other variables), the first person might have a predicted risk of 50% and the second person have a predicted risk of 90%. This puts everything into very intuitive numbers" comment by jlienert