scikit-learn-contrib / py-earth

A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines
http://contrib.scikit-learn.org/py-earth/
BSD 3-Clause "New" or "Revised" License
455 stars 121 forks source link

Fit Limits #142

Open Fish-Soup opened 7 years ago

Fish-Soup commented 7 years ago

Hi I wondered if it would be possible to add limits to the fitting algorithm, e.g y >0 at x= -inf. I would hope that this functionality is within the fitting modules used and it just the case of exposing those options to the user.

Cheers

jcrudy commented 7 years ago

@Fish-Soup Can you explain in more detail how this would work? Is it something that could be accomplished by post-processing the predictions?

Fish-Soup commented 7 years ago

@jcrudy I have some ideas on how to do this post process, but I don't feel they are ideal. My issue is when is that I am having issues predicting outside the range of my input variables.

Basically I am having problems predicting out of the range of the input variables, often as some polynomial or cross term is fitted here. I also have predictions that return negative y values which are not possible in the case I am working on (y should be 0.05). There are other examples in my problem where due to regulator effects I "know" what will happen under some values of X where I don't have the required data for MARS to fit. However my knowledge in this range is crude and I am unsure how to transition.

When I was trying to solve my problem using a non-linear equation I defined, I set some boundaries on the allowed fit parameters e.g popt, pcov = scipy.optimize.curve_fit(fn, xdata=xd, ydata=inliers[yplot], p0=popt, bounds=boundaries)

I realize writing this out that its not as simple as I first though, as previously I was limiting known parameters in a function where as you would have to put limits on y given x values

I would want would be to define relations are: gradient(y) >0 where -infinity<x<x_min, gradient(y) <gradient(y(x_min)) y > 0 where x_max<x<infinity.

Maybe if the user could define the functional form of the outer region in the fit? then he/she could define the boundaries on the choice of the parameters in this region?

I'm not sure now if my response has made the issue more complex when it already was!

Thanks for speedy response.

jcrudy commented 7 years ago

It would be tough to incorporate those kinds of constraints into the fitting process, and would probably slow down fitting dramatically. I do think much of what you want can be accomplished with post-processing, though. For example, you could define a region in which you know what you want the model to do and use a convex combination of the MARS model and your pre-defined model at every point, where as you get closer to your known region the weight on your pre-defined model goes to 1 and the weight on the MARS model goes to 0. It's also, obviously, easy to bound output to be nonnegative by post-processing.

Another thing that I often do is to use MARS as a data transformation, followed by some method that conforms to my constraints better, such as isotonic regression for example. The scikit-learn Pipeline can be helpful for that. I also have a bunch of code for these kinds of things in another repo, but it's not designed to be used by other people at this point. Still, you're welcome to have a look. It's located here: https://gitlab.com/jcrudy/sklearntools. You might get some ideas from calibration.py, but I'm not sure if anything is directly applicable to your use case.

Fish-Soup commented 7 years ago

cheers thanks for that! I think i can fit a fairly simple model outside the range. using the derivative and the prediction of the MARS model. Is it possible to get the second derivative of the MARS model?

Sorry for slow response had a lot on :)

jcrudy commented 7 years ago

It is not currently possible to get a second derivative. Even if it were, it wouldn't be continuous everywhere. If you still want it, it's probably possible to do yourself by inspecting the BasisFunctions. It would be some work, though.