uber / causalml

Uplift modeling and causal inference with machine learning algorithms
Other
4.87k stars 756 forks source link

SHAP Explainer error #735

Open seyidcemkarakas opened 5 months ago

seyidcemkarakas commented 5 months ago

I have got 2 questions:

1. Question:

I am trying to implement this tutorial on my own pc => https://causalml.readthedocs.io/en/latest/examples/causal_trees_interpretation.html

At TreeExplainer section here is the code that creates tree_explainer object:

tree_explainer = shap.TreeExplainer(ctree)
# Expected values for treatment=0 and treatment=1. i.e. Y|X,T=0 and Y|X,T=1
tree_explainer.expected_value

I was trying to run this code on my pc but I got this error:


TypeError Traceback (most recent call last)

in () ----> 1 tree_explainer = shap.Explainer(ctree) 2 # Expected values for treatment=0 and treatment=1. i.e. Y|X,T=0 and Y|X,T=1 3 tree_explainer.expected_value /data/envs/berkere/lib/python3.8/site-packages/shap/explainers/_explainer.py in __init__(self, model, masker, link, algorithm, output_names, feature_names, linearize_link, seed, **kwargs) 169 # if we get here then we don't know how to handle what was given to us 170 else: --> 171 raise TypeError("The passed model is not callable and cannot be analyzed directly with the given masker! Model: " + str(model)) 172 173 # build the right subclass TypeError: The passed model is not callable and cannot be analyzed directly with the given masker! Model: CausalTreeRegressor()

How can I handle it ?

2. Question:

At another implementation, I was able to get SHAP values like that:

uplift_model = BaseTClassifier(XGBClassifier(n_estimators=100,
                                             max_depth=5,
                                             learning_rate=0.1), control_name='Kontrol')

uplift_model.fit(df_train[x_names].values,
                 treatment=df_train['treatment_group_key'].values,
                 y=df_train['conversion'].values)

model_tau = uplift_model.predict(df_test[x_names].values)

uplift_model_shap_values = uplift_model.get_shap_values(X=df_test[x_names].values, tau=model_tau, features=x_names);

Then I wanted to deep dive by looking locally. I have a prediction that control score ise 0.000219 and Treatment score is 0.041069. In that case I can say that I have to apply the Treatment to this data (because Treatment score is better that control score and I can see the number in the recommended_treatment column is 1). Then I plotted shap.waterfall_plot I saw that most important features for this instance always decreased the SHAP value no matter what base_value is. So I want an explenation that How should I read the SHAP plot on Uplift models cuz we know that uplift models are not as like as traditional ML models. I extremely want to know how Uplift Model decides to say "You should implement Treatment (or 2,3, whatever) to this data"