shap / shap

A game theoretic approach to explain the output of any machine learning model.
https://shap.readthedocs.io
MIT License
22.49k stars 3.25k forks source link

Error in WaterFall Plot #1420

Open techwithshadab opened 4 years ago

techwithshadab commented 4 years ago

Is there any change in the WaterFall plot? Previously this was the syntax:

shap.waterfall_plot(expected_values, shap_values[row_index], data.iloc[row_index], max_display=max_features)

Now its throwing error. Even tried the newer version:

shap.waterfall_plot(shap_values)

Below is the error:

AttributeError                            Traceback (most recent call last)
<ipython-input-127-72505d1fc01f> in <module>
----> 1 shap.waterfall_plot(shap_values)

/opt/conda/anaconda/lib/python3.6/site-packages/shap/plots/_waterfall.py in waterfall(shap_values, max_display, show)
     40 
     41 
---> 42     base_values = shap_values.base_values
     43 
     44     features = shap_values.data

AttributeError: 'list' object has no attribute 'base_values'
roye10 commented 4 years ago

same issue here

Edit: works again for me

techwithshadab commented 4 years ago

same issue here

Edit: works again for me

How you got it working?

roye10 commented 4 years ago

My problem was related to the colour specifications. A simple reimport did the trick for me tbh. Maybe try the waterfall_legacy(expected_value, shap_values, features, feature_names, max_display)

techwithshadab commented 4 years ago

This code is also not working for me

techwithshadab commented 4 years ago

@slundberg - Can you please check this issue?

thoo commented 3 years ago

@shadab-entrepreneur I also got the same error. In my case, lightGBM returns probabilities for both classes and I have to modify the SHAP values as

shap_values.values=shap_values.values[:,:,1]
shap_values.base_values=shap_values.base_values[:,1]

Then, I was able to plot with shap.plots.waterfall.

d6tdev commented 3 years ago

Same issue. This works shap.plots._waterfall.waterfall_legacy(explainer.expected_value, shap_values[0])

Garve commented 3 years ago

Same problem for me. shap.plots._waterfall.waterfall_legacy works, though.

shap version 0.36.0

JCCKwong commented 3 years ago

Hi everyone, I tried using waterfall_legacy. Now I get the following error: IndexError: index 1 is out of bounds for axis 0 with size 1

Any advice would be greatly appreciated. I don't have any issues with force_plot. See below for code and error message:

shap.plots._waterfall.waterfall_legacy(explainer.expected_value, shap_values[0], features = pt_data, feature_names = features_list, max_display = 4, show = True)


IndexError Traceback (most recent call last)

in features = pt_data, feature_names = features_list, max_display = 4) \anaconda3\lib\site-packages\shap\plots\_waterfall.py in waterfall_legacy(expected_value, shap_values, features, feature_names, max_display, show) yticklabels[rng[i]] = feature_names[order[i]] else: yticklabels[rng[i]] = format_value(features[order[i]], "%0.03f") + " = " + feature_names[order[i]] # add a last grouped feature to represent the impact of all the features we didn't show IndexError: index 1 is out of bounds for axis 0 with size 1
jackcook1102 commented 3 years ago

Using waterfall legacy worked for me - but I got the original plot working by creating a class to pass as the argument, you need to assign a value to row indicating which row of data you'd like to show and a dataframe that you want to create shap values for:

class ShapObject:

    def __init__(self, base_values, data, values, feature_names):
        self.base_values = base_values # Single value
        self.data = data # Raw feature values for 1 row of data
        self.values = values # SHAP values for the same row of data
        self.feature_names = feature_names # Column names

shap_object = ShapObject(base_values = explainer.expected_value[1],
                         values = explainer.shap_values(dataframe)[1][row,:],
                         feature_names = dataframe.columns,
                         data = dataframe.iloc[row,:])

shap.waterfall_plot(shap_object)
thlevy commented 3 years ago

Revisiting this old topic. With shap 0.37.0, I managed to use waterfall legacy and original waterfall_plot using the trick above from @jackcook1102 but in case you are using np.array, you also need to carefully take care about the shape of input arrays. I provide here some example code for a multiclass classification that works for me:

X_test_array = X_test_tfidf.toarray() #Convert sparse to dense array
shap_values = explainer.shap_values(X_test_array)
c = clf.predict(X_test_tfidf) #class to explain
shap.plots._waterfall.waterfall_legacy(explainer.expected_value[c], shap_values[c].reshape(-1), X_test_array.reshape(-1),
                                       feature_names=vectorizer.get_feature_names(), max_display=10)

However, in the future, I think it could be helpful to plan some updates (and related examples) in waterfall_plot to make it easier to use?

asimraja77 commented 3 years ago

Same issue. This works shap.plots._waterfall.waterfall_legacy(explainer.expected_value, shap_values[0]). I agree with @thlevy in that we should have updated examples to make it easier for those who are new to this. I almost gave up before I stumbled upon this solution.

shap version 0.39.0

diegonov1 commented 3 years ago

Thanks for the ones that suggested shap.plots._waterfall.waterfall_legacy(). I couldn't find any help on the documentation, thanksfully I could find help here. Thanks guys!

nielsuit227 commented 3 years ago

The legacy didn't work for me. I fed the explainer a Pandas DataFrame (with various samples). Then rewrote the shap_values:

shap_values.values = np.mean(shap_values.values, axis=0)
shap_values.data = pd.DataFrame(columns=data.keys(), data=shap_values.data)

And then in _waterfal, I adjusted line 44-48 to:

    base_values = shap_values.base_values[0, 0]

    features = shap_values.data.mean().values
    feature_names = list(shap_values.data.keys())
    lower_bounds = getattr(shap_values, "lower_bounds", None)
    upper_bounds = getattr(shap_values, "upper_bounds", None)
    values = shap_values.values

And that did the trick for me!

timtreis commented 3 years ago

Another month, another fix.

The solutions described above didn't work for me when using a Keras predictor as compared to for example a XGB predictor. Therefore, I now rebuild the expected output for the Keras model based on the working XGB output using the native SHAP constructor.

keras_model = tf.keras.Sequential([layers.Dense(units = 1)])
keras_model.compile(optimizer = tf.optimizers.Adam(),
                    loss = 'MeanSquaredError')
keras_model.fit(X, y, 
                epochs = 100,
                verbose = 0)
keras_explainer = shap.KernelExplainer(keras_model.predict, X)
keras_shap_values = keras_explainer.shap_values(X)

values = keras_shap_values[0]
base_values = [keras_explainer.expected_value[0]]*len(keras_shap_values[0])

tmp = shap.Explanation(values = np.array(values, dtype=np.float32),
                       base_values = np.array(base_values, dtype=np.float32),
                       data=np.array(X),
                       feature_names=X.columns)

shap.plots.waterfall(tmp[0])
shap.plots.beeswarm(tmp)

This wouldn't have been necessary to only make waterfall work, but now beeswarm works as well.

mrg20 commented 3 years ago

Using waterfall legacy worked for me - but I got the original plot working by creating a class to pass as the argument, you need to assign a value to row indicating which row of data you'd like to show and a dataframe that you want to create shap values for:

class ShapObject:

    def __init__(self, base_values, data, values, feature_names):
        self.base_values = base_values # Single value
        self.data = data # Raw feature values for 1 row of data
        self.values = values # SHAP values for the same row of data
        self.feature_names = feature_names # Column names

shap_object = ShapObject(base_values = explainer.expected_value[1],
                         values = explainer.shap_values(dataframe)[1][row,:],
                         feature_names = dataframe.columns,
                         data = dataframe.iloc[row,:])

shap.waterfall_plot(shap_object)

Tried this in 0.39.0, still working, but the parameters for the ShapObject I used are:

shap_object = ShapObject(base_values = shap_values[0][0].base_values,
                         values = shap_values[0].values,
                         feature_names = df.columns,
                         data = shap_values[0].data)
Mark531 commented 3 years ago

shap_object = ShapObject(base_values = shap_values[0][0].base_values, values = shap_values[0].values, feature_names = df.columns, data = shap_values[0].data)

I have version 0.39.0, and shap_values is just an array, it does not contain the fields you mentioned.

kjgururaj commented 2 years ago

This worked for me

shap_object = shap.Explanation(base_values = shap_values[0][0].base_values, values = shap_values[0].values, feature_names = x_test.columns, data = shap_values[0].data)

shap.plots.waterfall(shap_object)

sorenwacker commented 2 years ago

It works, but it looks really confusing.

sorenwacker commented 2 years ago
class ShapAnalysis():

    def __init__(self, df):
        shap_values = explainer(df)
        self._shap_values = shap_values
        self._instance_names = df.index.to_list()
        self._feature_names = df.columns.to_list()
        self.df_shap = pd.DataFrame(
            shap_values.values, 
            columns=df.columns, 
            index=df.index
        )

    def waterfall(self, i, **kwargs):
        shap_values = self._shap_values
        shap_object = shap.Explanation(
                base_values = shap_values[i][0].base_values, 
                values = shap_values[i].values,
                feature_names = self._feature_names,
                instance_names=self._instance_names,
                data = shap_values[i].data,
        )
        shap.plots.waterfall(shap_object, **kwargs)

For people working with pandas.

samiit commented 2 years ago

A new class definition and its corresponding call, solved my issue. Just in case it helps someone else:

class ShapInput(object):
    def __init__(self, expectation, shap_values, features, feat_names):
        self.base_values = expectation
        self.values = shap_values
        self.data = features
        self.feature_names = feat_names

shap_input = ShapInput(explainer.expected_value, shap_values[0], 
                       test_point[0], feat_names=feat_names)

shap.waterfall_plot(shap_input)

Here is an example image for the California housing dataset

image

The difference to the legacy output is mainly the aesthetics (feature values, expectation symbol at the bottom, etc.).

I hope @slundberg or other contributors can fix it in the code, since otherwise, the examples are not working.

Best, Sam

antonkratz commented 2 years ago

@samiit I am using an explainer of type shap.explainers.GPUTree. It aparently stores the expectation not in explainer.expected_value, but in explainer.expected_value[0] (?). But even then I can not get it to work:

class ShapInput(object):
    def __init__(self, expectation, shap_values, features, feat_names):
        self.base_values = expectation
        self.values = shap_values
        self.data = features
        self.feature_names = feat_names

shap_input = ShapInput(another_explainer.expected_value[0], shap_values[entry], 5, feat_names=featurenames)

shap.waterfall_plot(shap_input)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_2720089/3328189826.py in <module>
      8 shap_input = ShapInput(another_explainer.expected_value[0], shap_values[entry], 5, feat_names=featurenames)
      9 
---> 10 shap.waterfall_plot(shap_input)

~/.local/lib/python3.9/site-packages/shap-0.40.0-py3.9-linux-x86_64.egg/shap/plots/_waterfall.py in waterfall(shap_values, max_display, show)
    124             yticklabels[rng[i]] = feature_names[order[i]]
    125         else:
--> 126             yticklabels[rng[i]] = format_value(features[order[i]], "%0.03f") + " = " + feature_names[order[i]]
    127 
    128     # add a last grouped feature to represent the impact of all the features we didn't show

TypeError: 'int' object is not subscriptable

Other visualizations such as the force plot or even waterfall legacy do work. I see several open issues with variations on this issue ([1], [2] and more), and am very confused about what is going on - does GPUTree somehow return a different data structure than other explainers, or why does it not work? And how can I get the waterfall (not legacy) to work? Any help / insight would be appreciated, thank you!

[1] https://github.com/slundberg/shap/issues/2362 [2] https://github.com/slundberg/shap/issues/2255

Chichostyle commented 2 years ago

i have a multiclass xgboost model, none of the solutions above worked for me, but this one did:

classes=15
row = 2000
shap.waterfall_plot(shap.Explanation(values=
      shap_values[classes][row], base_values=explainer.expected_value[classes],
      data=X[features1].iloc[row], feature_names=X[features1].columns.tolist()), max_display = 50)
ArbyC commented 2 years ago

SHAP version 0.41.0 I was experimenting with GAMI-net model.

Trained the explainer as such.

masker = shap.maskers.Independent(data = test_x)
explainer = shap.Explainer(model, masker = masker)
shap_values = explainer(test_x)

The explainer trained a "shap.explainers._permutation.Permutation". The shap_values are in the form as below:

shap_values.values = array([[ 1.10730644e-02,...,0.00000000e+00] ,....., [2.18415356e-02,...,-7.94728616e-11]])

shap_values.base_values = array([[0.41484901] ,....., [0.41484901]])

shap_values.data = array([[0.7244829 ,...., 0.5036719 ] ,......, [0.978,...,0.54155535]])

And when I tried

shap.waterfall_plot(shap_values[0])
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Input In [113], in <cell line: 1>()
----> 1 shap.waterfall_plot(shap_values[0])

File E:\Anaconda\envs\GIS\lib\site-packages\shap\plots\_waterfall.py:54, in waterfall(shap_values, max_display, show)
     52 # make sure we only have a single output to explain
     53 if (type(base_values) == np.ndarray and len(base_values) > 0) or type(base_values) == list:
---> 54     raise Exception("waterfall_plot requires a scalar base_values of the model output as the first "
     55                     "parameter, but you have passed an array as the first parameter! "
     56                     "Try shap.waterfall_plot(explainer.base_values[0], values[0], X[0]) or "
     57                     "for multi-output models try "
     58                     "shap.waterfall_plot(explainer.base_values[0], values[0][0], X[0]).")
     60 # make sure we only have a single explanation to plot
     61 if len(values.shape) == 2:

Exception: waterfall_plot requires a scalar base_values of the model output as the first parameter, but you have passed an array as the first parameter! Try shap.waterfall_plot(explainer.base_values[0], values[0], X[0]) or for multi-output models try shap.waterfall_plot(explainer.base_values[0], values[0][0], X[0]).

When I used the same codes on a EBM model, everything works fine. The base_value array for the explainer for EBM is in different form.

.base_values = array([-0.29164322, -0.29164322, -0.29164322, ..., -0.29164322, -0.29164322, -0.29164322])

Can anyone help me?

nprime496 commented 2 years ago

Same Issue.

I am able to plot waterfall explication using waterfall_legacy iin my notebook but code doesn't work in addition with streamlit.

shap 0.41.0

zhuzn commented 1 year ago

This works for me shap.plots._waterfall.waterfall_legacy(explainer.expected_value[0], shap_values[0].values, df.values[0], feature)

Mohammadsaknini commented 1 year ago

This worked for me

        shap_values = explainer(x_test)
        exp = shap.Explanation(shap_values[:, :, 0],
                               shap_values.base_values[:, 0],
                               x_test, feature_names=x_test.columns)
        shap.waterfall_plot(exp[1])
bbortey9 commented 10 months ago

This worked for me
shap_values = explainer(x_test) exp = shap.Explanation(shap_values[:, :, 0], shap_values.base_values[:, 0], x_test, feature_names=x_test.columns) shap.waterfall_plot(exp[1])

YuanfengZhang commented 10 months ago

This worked for me shap_values = explainer(x_test) exp = shap.Explanation(shap_values[:, :, 0], shap_values.base_values[:, 0], x_test, feature_names=x_test.columns) shap.waterfall_plot(exp[1])

For sklearn.LinearRegression().fit(), this returned error below: ----> exp = shap.Explanation(shap_values[:, :, 0], shap_values.base_values[:, 0], x_test, feature_names=x_test.columns)

IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed

YuanfengZhang commented 10 months ago

shap=0.43.0

shap.plots._waterfall.waterfall_legacy(explainer.expected_value, shap_values[0])

worked.

ChenXuanting commented 7 months ago

do not use shap_values = explainer.shap_values(X_test), use shap_values = explainer(X_test), instead. You'll get an explanation object.

diegoquezadac commented 5 months ago

When using a Keras model, the predict method adds an extra dimension to the output. This behavior impacts the dimension of the shap_values.values array, generating the error ValueError: The waterfall plot can currently only plot a single explanation, but a matrix of explanations (shape (28, 1)) was passed when using the following code:

explainer = shap.KernelExplainer(model, X)
shap_values = explainer(X)
shap.plots.waterfall(shap_values[0])

In order to fix this, a single line needs to be added before doing the plot:

explainer = shap.KernelExplainer(model, X)
shap_values = explainer(X)
shap_values.values = shap_values.values.squeeze()
shap.plots.waterfall(shap_values[0])

I'm using Shap 0.45.0 and X is a Pandas DataFrame.