pydata / patsy

Describing statistical models in Python using symbolic formulas
Other
947 stars 103 forks source link

ENH: option for nonreplicated exterior knots in b-splines #132

Open josef-pkt opened 6 years ago

josef-pkt commented 6 years ago

AFAICS, patsy sets all exterior knots in BS at the same points, i.e. lower_bound and upper_bound.

https://github.com/pydata/patsy/blob/master/patsy/splines.py#L229

I'm trying to replicate some mgcv functions, and mgcv chooses by default spread out exterior knots. I don't see a way how to replicate this with patsy's BS.

I started to work again on GAM for statsmodels https://github.com/statsmodels/statsmodels/pull/5296

side question: Is there a way to get access to the stateful transform and underlying spline, e.g. BS, instance from the design_info. (e.g. in general we would need to know which transform patsy applied to the basis function to incorporate the sum to zero or removal of constant constraint, because a transformation of the spline basis and parameters would also have to be applied to the penalization matrix.)

thequackdaddy commented 6 years ago

I played with this awhile back.

I ended up doing something like this gist...

https://gist.github.com/thequackdaddy/5ca8c72f8a4ac4a13507ec000ce02b61

The trick is that you have to fish out the BS object in a big mess. Then you can change the _all_knots. I didn't really put it in a model I actually used, but I think this does (sort of) what you want.

(I ended up going the opposite route of making sure that np.clip inside the spline so that the unseen data is guaranteed to not have a larger domain).

In line 11, I ran it with the original knots. I changed the knots in 12, and made a new matrix with the new knots in line 13.

josef-pkt commented 6 years ago

@thequackdaddy Thanks that shows the answer to the side question

vars(factor.state['transforms']['_patsy_stobj0__cr__'])
{'_all_knots': array([  2.4 ,  13.72,  20.56,  28.08,  39.28,  57.6 ]),
 '_constraints': array([[ 0.05757967,  0.28051497,  0.201252  ,  0.18132192,  0.22525439,
          0.05407704]]),
 '_cyclic': False,
 '_name': 'cr'}

I guess one problem with changing the knots attribute is that the constrast for the "center" constraint is not properly updated.