Open techwithshadab opened 4 years ago
same issue here
Edit: works again for me
same issue here
Edit: works again for me
How you got it working?
My problem was related to the colour specifications. A simple reimport did the trick for me tbh. Maybe try the waterfall_legacy(expected_value, shap_values, features, feature_names, max_display)
This code is also not working for me
@slundberg - Can you please check this issue?
@shadab-entrepreneur I also got the same error. In my case, lightGBM
returns probabilities for both classes and I have to modify the SHAP values as
shap_values.values=shap_values.values[:,:,1]
shap_values.base_values=shap_values.base_values[:,1]
Then, I was able to plot with shap.plots.waterfall
.
Same issue. This works shap.plots._waterfall.waterfall_legacy(explainer.expected_value, shap_values[0])
Same problem for me. shap.plots._waterfall.waterfall_legacy works, though.
shap version 0.36.0
Hi everyone, I tried using waterfall_legacy. Now I get the following error: IndexError: index 1 is out of bounds for axis 0 with size 1
Any advice would be greatly appreciated. I don't have any issues with force_plot. See below for code and error message:
shap.plots._waterfall.waterfall_legacy(explainer.expected_value, shap_values[0], features = pt_data, feature_names = features_list, max_display = 4, show = True)
IndexError Traceback (most recent call last)
Using waterfall legacy worked for me - but I got the original plot working by creating a class to pass as the argument, you need to assign a value to row
indicating which row of data you'd like to show and a dataframe
that you want to create shap values for:
class ShapObject:
def __init__(self, base_values, data, values, feature_names):
self.base_values = base_values # Single value
self.data = data # Raw feature values for 1 row of data
self.values = values # SHAP values for the same row of data
self.feature_names = feature_names # Column names
shap_object = ShapObject(base_values = explainer.expected_value[1],
values = explainer.shap_values(dataframe)[1][row,:],
feature_names = dataframe.columns,
data = dataframe.iloc[row,:])
shap.waterfall_plot(shap_object)
Revisiting this old topic. With shap 0.37.0, I managed to use waterfall legacy and original waterfall_plot using the trick above from @jackcook1102 but in case you are using np.array, you also need to carefully take care about the shape of input arrays. I provide here some example code for a multiclass classification that works for me:
X_test_array = X_test_tfidf.toarray() #Convert sparse to dense array
shap_values = explainer.shap_values(X_test_array)
c = clf.predict(X_test_tfidf) #class to explain
shap.plots._waterfall.waterfall_legacy(explainer.expected_value[c], shap_values[c].reshape(-1), X_test_array.reshape(-1),
feature_names=vectorizer.get_feature_names(), max_display=10)
However, in the future, I think it could be helpful to plan some updates (and related examples) in waterfall_plot to make it easier to use?
Same issue. This works shap.plots._waterfall.waterfall_legacy(explainer.expected_value, shap_values[0]). I agree with @thlevy in that we should have updated examples to make it easier for those who are new to this. I almost gave up before I stumbled upon this solution.
shap version 0.39.0
Thanks for the ones that suggested shap.plots._waterfall.waterfall_legacy(). I couldn't find any help on the documentation, thanksfully I could find help here. Thanks guys!
The legacy didn't work for me. I fed the explainer a Pandas DataFrame (with various samples). Then rewrote the shap_values:
shap_values.values = np.mean(shap_values.values, axis=0)
shap_values.data = pd.DataFrame(columns=data.keys(), data=shap_values.data)
And then in _waterfal
, I adjusted line 44-48 to:
base_values = shap_values.base_values[0, 0]
features = shap_values.data.mean().values
feature_names = list(shap_values.data.keys())
lower_bounds = getattr(shap_values, "lower_bounds", None)
upper_bounds = getattr(shap_values, "upper_bounds", None)
values = shap_values.values
And that did the trick for me!
Another month, another fix.
The solutions described above didn't work for me when using a Keras predictor as compared to for example a XGB predictor. Therefore, I now rebuild the expected output for the Keras model based on the working XGB output using the native SHAP constructor.
keras_model = tf.keras.Sequential([layers.Dense(units = 1)])
keras_model.compile(optimizer = tf.optimizers.Adam(),
loss = 'MeanSquaredError')
keras_model.fit(X, y,
epochs = 100,
verbose = 0)
keras_explainer = shap.KernelExplainer(keras_model.predict, X)
keras_shap_values = keras_explainer.shap_values(X)
values = keras_shap_values[0]
base_values = [keras_explainer.expected_value[0]]*len(keras_shap_values[0])
tmp = shap.Explanation(values = np.array(values, dtype=np.float32),
base_values = np.array(base_values, dtype=np.float32),
data=np.array(X),
feature_names=X.columns)
shap.plots.waterfall(tmp[0])
shap.plots.beeswarm(tmp)
This wouldn't have been necessary to only make waterfall work, but now beeswarm works as well.
Using waterfall legacy worked for me - but I got the original plot working by creating a class to pass as the argument, you need to assign a value to
row
indicating which row of data you'd like to show and adataframe
that you want to create shap values for:class ShapObject: def __init__(self, base_values, data, values, feature_names): self.base_values = base_values # Single value self.data = data # Raw feature values for 1 row of data self.values = values # SHAP values for the same row of data self.feature_names = feature_names # Column names shap_object = ShapObject(base_values = explainer.expected_value[1], values = explainer.shap_values(dataframe)[1][row,:], feature_names = dataframe.columns, data = dataframe.iloc[row,:]) shap.waterfall_plot(shap_object)
Tried this in 0.39.0, still working, but the parameters for the ShapObject I used are:
shap_object = ShapObject(base_values = shap_values[0][0].base_values,
values = shap_values[0].values,
feature_names = df.columns,
data = shap_values[0].data)
shap_object = ShapObject(base_values = shap_values[0][0].base_values, values = shap_values[0].values, feature_names = df.columns, data = shap_values[0].data)
I have version 0.39.0, and shap_values is just an array, it does not contain the fields you mentioned.
This worked for me
shap_object = shap.Explanation(base_values = shap_values[0][0].base_values, values = shap_values[0].values, feature_names = x_test.columns, data = shap_values[0].data)
shap.plots.waterfall(shap_object)
It works, but it looks really confusing.
class ShapAnalysis():
def __init__(self, df):
shap_values = explainer(df)
self._shap_values = shap_values
self._instance_names = df.index.to_list()
self._feature_names = df.columns.to_list()
self.df_shap = pd.DataFrame(
shap_values.values,
columns=df.columns,
index=df.index
)
def waterfall(self, i, **kwargs):
shap_values = self._shap_values
shap_object = shap.Explanation(
base_values = shap_values[i][0].base_values,
values = shap_values[i].values,
feature_names = self._feature_names,
instance_names=self._instance_names,
data = shap_values[i].data,
)
shap.plots.waterfall(shap_object, **kwargs)
For people working with pandas.
A new class definition and its corresponding call, solved my issue. Just in case it helps someone else:
class ShapInput(object):
def __init__(self, expectation, shap_values, features, feat_names):
self.base_values = expectation
self.values = shap_values
self.data = features
self.feature_names = feat_names
shap_input = ShapInput(explainer.expected_value, shap_values[0],
test_point[0], feat_names=feat_names)
shap.waterfall_plot(shap_input)
Here is an example image for the California housing dataset
The difference to the legacy output is mainly the aesthetics (feature values, expectation symbol at the bottom, etc.).
I hope @slundberg or other contributors can fix it in the code, since otherwise, the examples are not working.
Best, Sam
@samiit I am using an explainer of type shap.explainers.GPUTree
. It aparently stores the expectation not in explainer.expected_value
, but in explainer.expected_value[0]
(?). But even then I can not get it to work:
class ShapInput(object):
def __init__(self, expectation, shap_values, features, feat_names):
self.base_values = expectation
self.values = shap_values
self.data = features
self.feature_names = feat_names
shap_input = ShapInput(another_explainer.expected_value[0], shap_values[entry], 5, feat_names=featurenames)
shap.waterfall_plot(shap_input)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_2720089/3328189826.py in <module>
8 shap_input = ShapInput(another_explainer.expected_value[0], shap_values[entry], 5, feat_names=featurenames)
9
---> 10 shap.waterfall_plot(shap_input)
~/.local/lib/python3.9/site-packages/shap-0.40.0-py3.9-linux-x86_64.egg/shap/plots/_waterfall.py in waterfall(shap_values, max_display, show)
124 yticklabels[rng[i]] = feature_names[order[i]]
125 else:
--> 126 yticklabels[rng[i]] = format_value(features[order[i]], "%0.03f") + " = " + feature_names[order[i]]
127
128 # add a last grouped feature to represent the impact of all the features we didn't show
TypeError: 'int' object is not subscriptable
Other visualizations such as the force plot or even waterfall legacy do work. I see several open issues with variations on this issue ([1], [2] and more), and am very confused about what is going on - does GPUTree somehow return a different data structure than other explainers, or why does it not work? And how can I get the waterfall (not legacy) to work? Any help / insight would be appreciated, thank you!
[1] https://github.com/slundberg/shap/issues/2362 [2] https://github.com/slundberg/shap/issues/2255
i have a multiclass xgboost model, none of the solutions above worked for me, but this one did:
classes=15
row = 2000
shap.waterfall_plot(shap.Explanation(values=
shap_values[classes][row], base_values=explainer.expected_value[classes],
data=X[features1].iloc[row], feature_names=X[features1].columns.tolist()), max_display = 50)
SHAP version 0.41.0 I was experimenting with GAMI-net model.
Trained the explainer as such.
masker = shap.maskers.Independent(data = test_x)
explainer = shap.Explainer(model, masker = masker)
shap_values = explainer(test_x)
The explainer trained a "shap.explainers._permutation.Permutation". The shap_values are in the form as below:
shap_values.values = array([[ 1.10730644e-02,...,0.00000000e+00] ,....., [2.18415356e-02,...,-7.94728616e-11]])
shap_values.base_values = array([[0.41484901] ,....., [0.41484901]])
shap_values.data = array([[0.7244829 ,...., 0.5036719 ] ,......, [0.978,...,0.54155535]])
And when I tried
shap.waterfall_plot(shap_values[0])
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
Input In [113], in <cell line: 1>()
----> 1 shap.waterfall_plot(shap_values[0])
File E:\Anaconda\envs\GIS\lib\site-packages\shap\plots\_waterfall.py:54, in waterfall(shap_values, max_display, show)
52 # make sure we only have a single output to explain
53 if (type(base_values) == np.ndarray and len(base_values) > 0) or type(base_values) == list:
---> 54 raise Exception("waterfall_plot requires a scalar base_values of the model output as the first "
55 "parameter, but you have passed an array as the first parameter! "
56 "Try shap.waterfall_plot(explainer.base_values[0], values[0], X[0]) or "
57 "for multi-output models try "
58 "shap.waterfall_plot(explainer.base_values[0], values[0][0], X[0]).")
60 # make sure we only have a single explanation to plot
61 if len(values.shape) == 2:
Exception: waterfall_plot requires a scalar base_values of the model output as the first parameter, but you have passed an array as the first parameter! Try shap.waterfall_plot(explainer.base_values[0], values[0], X[0]) or for multi-output models try shap.waterfall_plot(explainer.base_values[0], values[0][0], X[0]).
When I used the same codes on a EBM model, everything works fine. The base_value array for the explainer for EBM is in different form.
.base_values = array([-0.29164322, -0.29164322, -0.29164322, ..., -0.29164322, -0.29164322, -0.29164322])
Can anyone help me?
Same Issue.
I am able to plot waterfall explication using waterfall_legacy
iin my notebook but code doesn't work in addition with streamlit.
shap 0.41.0
This works for me
shap.plots._waterfall.waterfall_legacy(explainer.expected_value[0], shap_values[0].values, df.values[0], feature)
This worked for me
shap_values = explainer(x_test)
exp = shap.Explanation(shap_values[:, :, 0],
shap_values.base_values[:, 0],
x_test, feature_names=x_test.columns)
shap.waterfall_plot(exp[1])
This worked for me
shap_values = explainer(x_test)
exp = shap.Explanation(shap_values[:, :, 0],
shap_values.base_values[:, 0],
x_test, feature_names=x_test.columns)
shap.waterfall_plot(exp[1])
This worked for me shap_values = explainer(x_test) exp = shap.Explanation(shap_values[:, :, 0], shap_values.base_values[:, 0], x_test, feature_names=x_test.columns) shap.waterfall_plot(exp[1])
For sklearn.LinearRegression().fit(), this returned error below: ----> exp = shap.Explanation(shap_values[:, :, 0], shap_values.base_values[:, 0], x_test, feature_names=x_test.columns)
IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed
shap=0.43.0
shap.plots._waterfall.waterfall_legacy(explainer.expected_value, shap_values[0])
worked.
do not use
shap_values = explainer.shap_values(X_test)
,
use
shap_values = explainer(X_test)
,
instead.
You'll get an explanation object.
When using a Keras model, the predict
method adds an extra dimension to the output. This behavior impacts the dimension of the shap_values.values array, generating the error ValueError: The waterfall plot can currently only plot a single explanation, but a matrix of explanations (shape (28, 1)) was passed
when using the following code:
explainer = shap.KernelExplainer(model, X)
shap_values = explainer(X)
shap.plots.waterfall(shap_values[0])
In order to fix this, a single line needs to be added before doing the plot:
explainer = shap.KernelExplainer(model, X)
shap_values = explainer(X)
shap_values.values = shap_values.values.squeeze()
shap.plots.waterfall(shap_values[0])
I'm using Shap 0.45.0 and X
is a Pandas DataFrame.
Is there any change in the WaterFall plot? Previously this was the syntax:
Now its throwing error. Even tried the newer version:
Below is the error: