Open XavB64 opened 2 months ago
Hi there,
Thanks for reaching out. Sorry to hear you are having issues. I am able to get auto-shap to work with classification models, including the one in the documentation example.
Could you provide more details about the dataframe you are passing in and where the error is exactly occurring?
Best, Micah
Hello,
Thank you for your reply !
I'm using python 3.9.13, the autoshap version is the latest 0.3.2, and pandas 2.2.2
I'm running the first documentation example:
>>> from auto_shap.auto_shap import generate_shap_values
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> x, y = load_breast_cancer(return_X_y=True, as_frame=True)
>>> model = ExtraTreesClassifier()
>>> model.fit(x, y)
>>> shap_values_df, shap_expected_value, global_shap_df = generate_shap_values(model, x)
Note that the second example with regression works without any issue.
The error on the 1st example is occuring on the last line of code when calling 'generate_shap_values'. Here is the complete error message:
{
"name": "ValueError",
"message": "Shape of passed values is (600, 2), indices imply (600, 30)",
"stack": "---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\\AppData\\Local\\Temp\\ipykernel\\74634.py in <module>
2 model = ExtraTreesClassifier()
3 model.fit(x, y)
----> 4 shap_values_df, shap_expected_value, global_shap_df = generate_shap_values(model, x)
c:\\Users\\anaconda3\\lib\\site-packages\\auto_shap\\auto_shap.py in generate_shap_values(model, x_df, linear_model, tree_model, boosting_model, calibrated_model, regression_model, voting_or_stacking_model, use_agnostic, n_jobs, sample_size, k)
312 voting_or_stacking_model
313 )
--> 314 shap_values_df, shap_expected_value, global_shap_df = produce_raw_shap_values(
315 model, x_df, use_agnostic, linear_model, tree_model, calibrated_model, boosting_model, regression_model,
316 voting_or_stacking_model, n_jobs, sample_size, k
c:\\Users\\anaconda3\\lib\\site-packages\\auto_shap\\auto_shap.py in produce_raw_shap_values(model, x_df, use_agnostic, linear_model, tree_model, calibrated_model, boosting_model, regression_model, voting_or_stacking_model, n_jobs, sample_size, k)
248 else:
249 if tree_model:
--> 250 return produce_shap_output_with_tree_explainer(model, x_df, boosting_model, regression_model, False,
251 n_jobs=n_jobs)
252 elif linear_model:
c:\\Users\\anaconda3\\lib\\site-packages\\auto_shap\\auto_shap.py in produce_shap_output_with_tree_explainer(model, x_df, boosting_model, regression_model, linear_model, return_df, n_jobs)
123 global_shap_df = generate_shap_global_values(shap_values, x_df)
124 if return_df:
--> 125 shap_values_df = make_shap_df(shap_values, x_df)
126 return shap_values_df, shap_expected_value, global_shap_df
127 else:
c:\\Users\\anaconda3\\lib\\site-packages\\auto_shap\\utilities.py in make_shap_df(shap_values, x_df)
152 :return: dataframe of SHAP values
153 \"\"\"
--> 154 return pd.DataFrame(shap_values, columns=list(x_df))
155
156
c:\\Users\\anaconda3\\lib\\site-packages\\pandas\\core\\frame.py in __init__(self, data, index, columns, dtype, copy)
825 )
826 else:
--> 827 mgr = ndarray_to_mgr(
828 data,
829 index,
c:\\Users\\anaconda3\\lib\\site-packages\\pandas\\core\\internals\\construction.py in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
334 )
335
--> 336 _check_values_indices_shape_match(values, index, columns)
337
338 if typ == \"array\":
c:\\Users\\anaconda3\\lib\\site-packages\\pandas\\core\\internals\\construction.py in _check_values_indices_shape_match(values, index, columns)
418 passed = values.shape
419 implied = (len(index), len(columns))
--> 420 raise ValueError(f\"Shape of passed values is {passed}, indices imply {implied}\")
421
422
ValueError: Shape of passed values is (600, 2), indices imply (600, 30)"
}
Thank you very much for your help :)
Sorry for the delay! I did also have trouble under those package versions. I was able to get the example to run again with the below libraries.
I think the underlying issue is with changes to newer versions of dumpy and the underlying SHAP library. I will plan to address in an upcoming release I have on the docket (which should give better support to multiclass classification problems).
auto-shap==0.3.2 cloudpickle==3.0.0 contourpy==1.3.0 cycler==0.12.1 fonttools==4.54.1 importlib_resources==6.4.5 joblib==1.4.2 kiwisolver==1.4.7 llvmlite==0.43.0 matplotlib==3.9.2 numba==0.60.0 numpy==1.26.4 packaging==24.1 pandas==2.2.2 pillow==10.4.0 pyparsing==3.1.4 python-dateutil==2.9.0.post0 pytz==2024.2 scikit-learn==1.5.2 scipy==1.13.1 shap==0.44.0 six==1.16.0 slicer==0.0.7 threadpoolctl==3.5.0 tqdm==4.66.5 tzdata==2024.2 zipp==3.20.2
Hello,
I found that the first example with the ExtraTreesClassifier() doesn't work: "ValueError: Shape of passed values is (30, 2), indices imply (30, 30)"
It seems the library works for regression only but not for classification
Do you have any recommendation for that ?
Regards