pydata / patsy

Describing statistical models in Python using symbolic formulas
Other
954 stars 104 forks source link

Spline basis scales input #170

Closed memeplex closed 3 years ago

memeplex commented 3 years ago

Is this right?

image

I'm unable to see any warning regarding input normalization or anything like that in the docs. I'm not an expert when it comes to spline basis, but I don't think normalization is a prerequisite.

So why is this happening?

memeplex commented 3 years ago

Concretely, the problem I have with this is that if I first fit a model with some design matrix produced as above and then generate some range of values in order to evaluate that model, then the evaluation is in a different scale than the original data. Maybe I should use the already fit transformer, but if that's the case it's not clear how and all the docs and examples I've found simply uses dmatrix, which I have to call twice, one for fit, another for evaluation, thus getting the same (scaled) prediction for, say, the range [0,1] and the range [0,10].

memeplex commented 3 years ago

Ok, I've found the answer myself, https://patsy.readthedocs.io/en/latest/stateful-transforms.html#stateful-transforms, not specific to splines but a general transformation concern.