Open CherylCB opened 5 years ago
@CherylCB Thanks for reporting this. I will look into it as soon as I can, which might be a while. It's annoying, but if this bug is causing problems for you the best workaround might be to add your own code to prune any duplicate basis functions. If you want to do that and need guidance on where to start, please comment here and I can elaborate.
It seems you're using py-earth essentially as a variable selection mechanism for linear regression. Is that right? If so, you could also just extract the set of selected variables and use them with sklearn.LinearRegression
.
Thanks for your answer @jcrudy. For now indeed I have added my own code to workaround the problem of adding duplicate basis functions.
@CherylCB Glad you were able to work around this bug. Please feel free to post code for your workaround in this thread if it's shareable. It might help someone out later. I'll be leaving this issue open until it's fixed.
@jcrudy I have some time coming days to work on this bug, do you have any suggestions on what would be the first place for me to look?
@CherylCB That's great. I'll take all the help I can get. Going off of the current master, I'd start by looking here. As you'll see, I wrote some special code to try to prevent exactly what you are seeing. One of two possible things is probably happening:
The has_linear
method in that line is not correct. The has_linear
method is defined as part of the BasisFunction
abstract class, which you'll find in _basis.pyx. It calls linear_in
, which has different implementations for the different subclasses. Possibly something in this logic is wrong.
The forward pass is not using variable_can_be_linear
correctly in some specific situations. If this is what's happening, you'll have to go over the logic of the forward pass and see where the linear basis function is being added in your example, then figure out how to prevent it from happening. If you find any clues here they could be very helpful for me, even if you're not sure how to fully solve the problem.
You're actually in a good position to figure this out, since you have an example data set that shows the problem. I'd suggest you try to debug what's happening in the forward pass using a script that fits a model to your data set. Unfortunately, it's hard to set up a debugger to work with cython. Perhaps you're more skilled than me in this area, but if not I suggest you just use print statements in the cython files.
The workflow is something like this:
If you have any problems, don't hesitate to get in touch. You can reply here or email me (my address is on my github profile). You're potentially saving me a lot of time by working on this, so of course I'm very happy to spend some time helping you succeed at it. Good luck!
I'm trying to run a pyearth model with
enable_pruning = False
and only linear features with amax_degree
of1
. I noticed that the same feature is added twice. See results below:I noticed issue #135 which seems to suggest the same bug, addressed by @jcrudy . I'm installing from the latest commit
git+https://github.com/scikit-learn-contrib/py-earth.git@b209d1916f051dbea5b142af25425df2de469c5a#egg=sklearn-contrib-py-earth