pydata / patsy

Describing statistical models in Python using symbolic formulas
Other
948 stars 103 forks source link

Better explain what "statistical models" patsy can handle in the README / overview? #3

Closed cdeil closed 11 years ago

cdeil commented 12 years ago

Physicists / astronomers (and maybe other potential users like engineers, ...?) often use a different vocabulary than statisticians / economists. E.g. I'm an astronomer and I would say that y(x) = a * x + b is a linear model with parameters a and b and that y(x) = exp(- x / s) is a non-linear model with parameter s. Now when I read "This is patsy, a Python library for describing statistical models and building design matrices." I was excited that I might be able to specify arbitrary models (being ignorant of what exactly "statistical models" are and thinking that "design matrices" refers to the subset of linear models patsy can handle) and fit them to data, but if I understand correctly patsy only handles linear models, right?

Maybe you could add a sentences or two to the README or to http://patsy.readthedocs.org/en/latest/overview.html to make it even more clear up-front what kinds of "statistical models" patsy can and can't handle? (I had the same problem when I first saw statsmodels, it took me a while to figure out that it only can fit very specific models, not arbitrary nonlinear models.)

Two more things you could add to the docs:

Thanks for making patsy and writing great documentation! I'll try to learn how statisticians do regression from the patsy and statsmodels docs.

jseabold commented 12 years ago

FWIW, I also found it a bit odd there was no mention of statsmodels in the docs, considering we are the only project using patsy that I know of, but I held my tongue. The impression you get from this package is go build your own stats code, which I wouldn't discourage in the name of competition, but...

FYI, there are plans (and code in the works) to do arbitrary non-linear regression in statsmodels that are described by a high-level formula language.

[Edit: The code in the works is just for non-linear models not the formula description - ie., you have to pass a function to minimize. A high-level description is not necessarily a simple problem - you're spitting out functions not a design matrix.]

njsmith commented 11 years ago

I think 639b2fd addresses all the issues raised here? If not feel free to re-open. But specifically, it:

@jseabold: Definitely not trying to tell people to go build their own stats code, just, patsy isn't specific to any particular package (I hope!) so it's not clear how patsy's documentation should go about documenting packages that build on top of it... if you have any suggestions on how to handle this better, please let me know :-).