BUG: TreeSHAP Interventional explanations segmentation fault

amir-rahnama commented 7 months ago

Issue Description

When you explain RandomForest models (and sometimes even GradientBoostingTrees with Sklearn) using TreeSHAP explainer, tree-shap script breaks.

I was assuming that this was fixed in BUG: Interventional TreeSHAP failing for large depth tree-based models, but this is still recurring in SHAP version 0.44.1 and even 0.43.0.

Colab link: https://colab.research.google.com/drive/1bAH-4WclIJQPvPwoY3cZ-UZu8MAccaTO?usp=sharing

Minimal Reproducible Example

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import KBinsDiscretizer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.datasets import fetch_openml
from joblib import dump, load
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import seaborn as sns
import argparse
import numpy as np
import pandas as pd
import sys
import os
import pickle
import scipy
from sklearn.preprocessing import StandardScaler
import sklearn
import collections
import shap

random_state = 10
np.random.seed(random_state)

dset_id = 293

ames = fetch_openml(data_id = dset_id, as_frame='auto')
random_state = 10
ames.target[ames.target == '-1'] = 0
ames.target[ames.target == '1'] = 1

target = ames.target.astype(int)
ames.data = ames.data.toarray()

if np.sum(np.isnan(ames.data)):
    feat_col_means = np.nanmean(ames.data, axis=0)
    ames.data = np.where(np.isnan(ames.data), feat_col_means, ames.data)

X_train, X_test, y_train, y_test = train_test_split(ames.data, target, test_size=0.33, random_state=10)

model_name = 'rf'

if model_name == 'rf':
    model = RandomForestClassifier(random_state=random_state)
else: 
    model = GradientBoostingClassifier(random_state=random_state)

model.fit(X_train, y_train)

shap_explainer = shap.TreeExplainer(model, X_train, 
                                    feature_perturbation="interventional",
                                    model_output='probability')

def tree_shap_exp(instances, x_train, model_obj, x_test):
    shap_values = shap_explainer.shap_values(instances, check_additivity=False)
    shap_values = np.array(shap_values)

    return shap_values

res = tree_shap_exp(X_test[:10], X_train, model, X_test)

Traceback

Segmentation fault (core dumped)

Expected Behavior

I expect TreeSHAP to work since neither the dataset nor the tree model is large enough not to fit into the memory.

Bug report checklist

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest release of shap.
[X] I have confirmed this bug exists on the master branch of shap.
[ ] I'd be interested in making a PR to fix this bug

Installed Versions

SHAP == 0.44.1 or SHAP == 0.43.0
Sklearn == 1.4.0 or 1.3.0
OS Ubuntu 20.04.6 LTS or MAC M1 Montrey
Available Memory on Server: total used free shared buff/cache available Mem: 196671072 20887784 62542380 123692 113240908 174204456 Swap: 8388604 4712112 3676492

amir-rahnama commented 7 months ago

Here is the crash logs on Google Colab:

amir-rahnama commented 7 months ago

UPDATE: If you set max_depth=30 for training the RF, the issue will go be fixed.

Possible other issues related to this:

CloseChoice commented 6 months ago

Thanks for your report. I remember that we had the problem previously. Seems like a deep bug in our C code

stergioa commented 3 months ago

Thanks for your report. I remember that we had the problem previously. Seems like a deep bug in our C code

Any fix?

CloseChoice commented 3 months ago

@stergioa thanks for the gentle push. AFAIK nobody is working on this. Just checked the example and it doesn't throw an error on my Windows PC. We would certainly need one first before we can start working on this.

shap / shap