Open kaas66 opened 6 years ago
1) I'm generally not happy with the interpretability of the non-binned explainer - your suggestions might very well be a better approach...
2) True
3) This is correct - the explanation looks into whether the bin significantly affects the model. I guess the reason that it is done this way is to ease interpretation at the cost of a potentially worse local fit..
4) I'll have to be honest and say I don't remember my reasons for doing so - I think it was a matter of matching the python implementation as best as possible as SciKitLearn doesn't have the same ridge regression implementation...
5) Perhaps, but generating the permutations are really fast and I'd rather have proven independence than assuming it doesn't matter
6) The time series support is a bit experimental as I don't really do time series analysis. The idea is that for an observation to be explained you keep the timestamp fixed and look at how all the other variables affects the prediction. The intuition is that we already assumes the time dimension is relevant, otherwise we wouldn't try to make a forecast. Another reason for doing this is that I have no real idea of how to sensibly permute the time dimension and calculate the distance...
For question 4, could we use cv.glmnet instead of glmnet to choose the best lambda value?
I'm looking at line 55 in lime.R:
fit <- glmnet(x[shuffle_order, features], y[[label]][shuffle_order], weights = weights[shuffle_order], alpha = 0, lambda = 0.001)
Maybe replace with this?
fit <- cv.glmnet(x[shuffle_order, features], y[[label]][shuffle_order], weights = weights[shuffle_order], alpha = 0)
Also, perhaps use caret to choose both alpha and lambda using cv?
I have used the R-package LIME with great enthusiasm. This has generated some questions :
1) For continuous variables with bin_continuous=FALSE, it seems like what is shown in the plot generated using plot_features() are the coefficients in the linear regression model and not the coefficients multiplied with the corresponding feature values. Is this correct? If yes, wouldn't it be more intuitive to plot the coefficients multiplied with the corresponding feature values, since this is the real contribution for each feature? (or to plot the standardized coefficients from the regression if these are available).
2) For continuous variables with bin_continuous=FALSE, it seems like the distance (used when computing the weights) is computed using scaled variables, while the ridge regression is performed on unscaled values. Is this correct? If yes, I assume this is due to the fact that standardization is performed in glmnet() by default?
3) For continuous variables with bin_continuous=TRUE, it seems like the data set used for the regression (and for the computation of weights) consists of zeroes and ones only, where the value in row i for variable j is 1 if the bin for this variable in this row is equal to the bin for the same variable in the observation vector and 0 otherwise. Is this correct? If yes, doesn't one then discard a lot of information, since there obviously is a larger distance between e.g. bins 1 and 5 than there is between bins 1 and 2?
4) In the ridge regression you seem to have hard-coded the value of lambda to be 0.001. Is there a particular reason behind choosing this value?
5) From the R-code it seems like you generate a new data set for each observation you want to explain? As you write yourself in https://cran.r-project.org/web/packages/lime/vignettes/Understanding_lime.html the permuted data set is independent from the observation to be explained. Hence, wouldn't it then be logical to use the same permuted data set for all cases to be explained?
6) From the R-code it seems like there is possible to use time series data and that what is done using LIME with such data is to generate the permuted data set by sampling from the training data (i.e no noise). Is that correct? Then, when fitting the linear model, the different observations in the time series seems to be regarded as independent data. Is that correct?