scikit-learn-contrib / py-earth

A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines
http://contrib.scikit-learn.org/py-earth/
BSD 3-Clause "New" or "Revised" License
455 stars 122 forks source link

Numerical problem in forward pass #97

Closed jcrudy closed 7 years ago

jcrudy commented 8 years ago

There is a section of code in the rewrite_forward_pass branch starting here:

https://github.com/scikit-learn-contrib/py-earth/blob/rewrite_forward_pass/pyearth/_knot_search.pyx#L585

This section deals with a potential numerical problem that I have not entirely figured out how I want to deal with. What happens here is that the computed improvement in the squared error loss function is greater than the total squared error, so we end up with an impossible negative loss situation. This is obviously an issue of numerical stability, and is not unexpected because the math here is base on the cholesky decomposition. I had a look through Steve Milborrow's earth code recently and found some safeguards in place for similar problems, although I don't understand his code well enough to say exactly how his solution relates to mine.

So, here's the issue: Given that this numerical problem sometimes happens, how should the program respond? Currently, it responds by correcting the calculation and printing a bunch of debugging related information that a user would find unintelligible and disturbing. Probably the best thing to do is just take out the print statements and leave it at that. However, since I don't fully understand the precise conditions that lead to the instability, I am unsure. I have yet to find a reliable way to reproduce the problem with a data set I can share or that is small enough to deal with easily.

The first step in figuring out this issue would be to figure out exactly what conditions trigger it. However, this may be a post-release issue. The fix I have in place now seems to produce good results in every situation I've encountered.

@mehdidc, do you have an opinion on any of this? I know you probably haven't looked at the branch much yet at all, but any thoughts are appreciated. If you want to investigate it at all, I would be glad to send you some of my notes.

mehdidc commented 8 years ago

@jcrudy Ok, I will look closely into in the rewrite_forward_pass branch and tell you and yes I would appreciate to read your notes, thanks.

jcrudy commented 8 years ago

Great, I'll send you those notes:). I don't think I have your email address. Mine is on my github profile. Will you send me an email and I'll respond with notes?

jcrudy commented 8 years ago

Tagging @pnucci, in case he has some thoughts on this. @pnucci, email me if you also want the notes.

mehdidc commented 8 years ago

@jcrudy Ok my mail is public on github now, I also sent you a mail.

jcrudy commented 8 years ago

@mehdidc I replied with notes attached. Did you get my reply?

mehdidc commented 8 years ago

@jcrudy Ah ok, weird I didn't receive it, I sent you again the same message with another mail, could you please resend your message to that mail ?

jcrudy commented 8 years ago

@mehdidc Maybe it got blocked because of the attached pdf? I'll resend with and without pdf.

jcrudy commented 8 years ago

@mehdidc Okay, you should have some email by now.

jcrudy commented 7 years ago

I'm closing this for now. The issue seems inherent to the MARS algorithm and the current workaround (detect the error and correct via a slower calculation) seem satisfactory. I think it may be possible to improve on this handling in both the detection and correction of the problem, but for now there is no need.