API for Shapley value estimation

jpmml / jpmml-evaluator

Java Evaluator API for PMML

GNU Affero General Public License v3.0

895 stars 255 forks source link

API for Shapley value estimation #249

Open popovstefan opened 2 years ago

popovstefan commented 2 years ago

I have a project where I would like to use a LightGBM model trained in Python do perform prediction on feature contributions (Shapley values), in the same manner as answered in this StackOverflow question:

https://stackoverflow.com/a/64793530/15052008

Is this possible in the current version of this library? I have gone through the documentation and various JPPML tutorials and I couldn't figure out a way how to do that. I have successfully trained, converted, and deployed a model in a Java app, but with it I can only predict probabilities (simple model inference).

vruusmann commented 2 years ago

Is this possible in the current version of this library?

Shapley values are model evaluation-time phenomenon, not model training- or conversion-time phenomenon.

Therefore, the JPMML-LightGBM library needs no changes in this area.

Moving this issue to a more appropriate location.

vruusmann commented 2 years ago

There is a related project, which performs simple feature impact analysis with various tree ensemble methods (boosting, bagging): https://github.com/vruusmann/rf_feature_impact

What's the canonical algorithm for estimating Shapley values?

Ideally, the predicted value of the target field could implement some marker interface(s), which would trigger the computation of Shapley values in situ. The Pythonic approach where every prediction aspect (eg. predict, predict_proba, shap) involves running the whole prediction again from scratch seems kind of wasteful.

04pallav commented 2 months ago

@vruusmann if there is a pmml (.xml file) with preprocessor + model. Is there a way to use the pmml file to only produce the preprocessed data and not the final prediction? (only apply the transforms - something similar to sklearn-pipeline.transform())

More context- not necessary for you to read - I am trying to use Pmml & shap library together. TreeExplainer in shap library needs the actual sklearn Tree classes. if using pmml i can get preprocessed data - i can pass that to model object in shap library. I was hoping there would be some way to convert pmml back to sklearn Pipeline but probably thats not possible.