thomasp85 / lime

Local Interpretable Model-Agnostic Explanations (R port of original Python package)
https://lime.data-imaginist.com/
Other
486 stars 110 forks source link

Ridge in "forward selection" and "highest weight" #134

Closed beopis closed 5 years ago

beopis commented 6 years ago

Thank you for your work! My question is: why select_f_fs() and select_f_hw(), used in the model selection step, have fixed lambda = 0 in glmnet()? From the help of glmnet(), we see that the loss function of a GLMnet gaussian model is lossfunct1 (where P is the penalty term); doesn't it mean that these model selections are using a Ordinary Least Squares regression instead of a ridge regression, as reported in lime's documentation?

beopis commented 6 years ago

Another question: do classifiers fit the explainer on posterior probabilities or on their logit transformation? It looks to me like if they are fitting the probabilities as they are, right?

beopis commented 5 years ago

Thank you for your work! My question is: why select_f_fs() and select_f_hw(), used in the model selection step, have fixed lambda = 0 in glmnet()? From the help of glmnet(), we see that the loss function of a GLMnet gaussian model is lossfunct1 (where P is the penalty term); doesn't it mean that these model selections are using a Ordinary Least Squares regression instead of a ridge regression, as reported in lime's documentation?

beopis commented 5 years ago

@thomasp85, did I misunderstood the code ... ? By RSS i mean "residual sum of squares".

iliasp23 commented 5 years ago

When you call glmnet function, a=0 stands for ridge regression and a=1 for lasso regression. Both of them, are using ordinary least squares. The difference is that they have a tuning parameter (λ) which penalizes the least squares results in order to achieve a more parsimonious model and deal with overfitting.

If you choose lasso regression, the penalty is more "strict" comparing with ridge's penalty. So the lasso regression can be used for model selection because it completely removes the influence of a variable that is not needed. But if you choose ridge regression the final model will have all the variables but their coefficients will be shrunk towards 0.

beopis commented 5 years ago

This doesn't answer my question ... my question is why is it fixed lambda=0 in the code? Doing this, the penalty term has no effect, and the model selection is not RIDGE, but OLS.

iliasp23 commented 5 years ago

Yes, you are right I thought that you were talking about the lambda parameter. Just like you said the code works with lambda=0, probably the use of glmnet is to fit the model without using the glm function, I cannot find another reason.

thomasp85 commented 5 years ago

This is probably due to my lack of understanding of lambda when I wrote it.. will get fixed in next release

thomasp85 commented 5 years ago

Ok, so I guess this will not change. The choice of lambda = 0 is a result of the port from the original Python code where alpha = 0 is used for the Ridge fitting (note the difference in interpretation of alpha between glmnet and sklearns Ridge module)...

I'll be happy to accept a range of additional feature selectors as a PR if you wish and feel the need