statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.11k stars 2.88k forks source link

Weights #505

Open jseabold opened 12 years ago

jseabold commented 12 years ago

Make sure weights are correctly handled throughout the models. This includes GLM, RLM, ANOVA, and the discrete choice models. I think it also might make sense to have weights objects. It might also be interesting to see how far we can get with those provided by PySAL, but I haven't spoken with their developers since the summer. Many of their estimators are just duplicating ours. We should make it easy for them to use our code.

jseabold commented 10 years ago

A request for RLM weights. It looks like you can compare to MATLAB (?).

https://stackoverflow.com/questions/21755153/using-robust-linear-methods-from-python-module-statsmodels-with-weights

josef-pkt commented 10 years ago

I just read this a few days ago

Carroll, Raymond J., and David Ruppert. "Robust estimation in heteroscedastic linear models." The annals of statistics (1982): 429-441.

There are also articles for RLM, M-estimation, with AR(1) and with spatial errors.

So far I don't know what (prior) heteroscedasticity weights would mean in discrete models and the same models in GLM.

josef-pkt commented 10 years ago

to check what matlab has: robust option in curvefit http://www.mathworks.com/help/curvefit/least-squares-fitting.html#bq_5kr9-4 and robust regression without a weights options (wfun is our norms M) http://www.mathworks.com/help/stats/robustfit.html

josef-pkt commented 10 years ago

GLM https://groups.google.com/d/msg/pystatsmodels/QtSH8T47pZg/KYwJCrxD3eYJ Stata and SAS use weights for loglikeobs w_i * loglike_i Stata poisson only mentions fweights and pweights (and iweights), but doesn't have aweights. Stata glm also has aweights but not clear how it's used

more on robust: Some papers use weighted likelihood to discount influential observations, x outliers Trimmed MLE uses 0-1 weights for loglike to cut outliers. (same as subset selection in this case).

josef-pkt commented 9 years ago

to the last point: importance weights for Poisson and GLM, question on stackoverflow http://stackoverflow.com/questions/28951982/using-weightings-in-a-poisson-model-using-statsmodels-module

GEE has weights, #2090

josef-pkt commented 9 years ago

a stackoverflow question asking for weights in GLM or Logit to compensate for imbalanced sample http://stackoverflow.com/questions/31661552/statsmodels-python-weighted-glm This might be similar to inverse probability weights #2443 #2442 in the interpretation.

josef-pkt commented 9 years ago

also related: using the variance function in GLM to introduce weights and heteroscedasticity #1777

josef-pkt commented 8 years ago

another similar question on stack overflow (imbalanced sample in Logit) http://stackoverflow.com/questions/33605979/statsmodels-logistic-regression-class-imbalance (by now I figured out caseweights in GLM Binomial a bit better)

I'm opening issue specific to rare events, unbalanced sample.