jpmml / jpmml-evaluator

Java Evaluator API for PMML
GNU Affero General Public License v3.0
883 stars 256 forks source link

API for Shapley value estimation #249

Open popovstefan opened 1 year ago

popovstefan commented 1 year ago

I have a project where I would like to use a LightGBM model trained in Python do perform prediction on feature contributions (Shapley values), in the same manner as answered in this StackOverflow question:

Is this possible in the current version of this library? I have gone through the documentation and various JPPML tutorials and I couldn't figure out a way how to do that. I have successfully trained, converted, and deployed a model in a Java app, but with it I can only predict probabilities (simple model inference).

vruusmann commented 1 year ago

Is this possible in the current version of this library?

Shapley values are model evaluation-time phenomenon, not model training- or conversion-time phenomenon.

Therefore, the JPMML-LightGBM library needs no changes in this area.

Moving this issue to a more appropriate location.

vruusmann commented 1 year ago

There is a related project, which performs simple feature impact analysis with various tree ensemble methods (boosting, bagging): https://github.com/vruusmann/rf_feature_impact

What's the canonical algorithm for estimating Shapley values?

Ideally, the predicted value of the target field could implement some marker interface(s), which would trigger the computation of Shapley values in situ. The Pythonic approach where every prediction aspect (eg. predict, predict_proba, shap) involves running the whole prediction again from scratch seems kind of wasteful.