shap / shap

A game theoretic approach to explain the output of any machine learning model.
https://shap.readthedocs.io
MIT License
22.08k stars 3.21k forks source link

ENH: Limiting number of CPU cores used by shap #3548

Closed hungngocphat01 closed 3 months ago

hungngocphat01 commented 4 months ago

Problem Description

Hi,

I'm using shap to generate feature importance for XGBoost models using TreeExplainer. The workload runs on a shared production server with 144 cores, and every time shap hogs up all the cores by default.

Since the resources are shared among other users, it would be great if there's some option to limit the number of CPU cores used by shap. I could not find any information on this in the shap documentation. It seems that shap uses numba as the backend, but setting NUMBA_NUM_THREADS does not seem to help either.

Alternative Solutions

Please provide guide in the documentation on how to limit the number of cores if already possible, or add a new option to do so.

Additional Context

No response

Feature request checklist

CloseChoice commented 3 months ago

Xgboost implemented calculation of shap values for us and we basically just call this functionality. Hence it should be possible to limit the number of threads (not cores though) with the global config of xgboost.

import xgboost as xgb

# Show all messages, including ones pertaining to debugging
xgb.set_config(nthread=10, verbosity=2)

# call shap ...

Would be great if you could report back if this helped. This is a bit of a tricky situation for us since we support a number of backends and not all might conform to the same interface. So adding a parameter that limits this might not be available for other libraries etc. Maybe we need to do some digging for catboost, lightgbm and sklearn (sklearn definitely has that).

hungngocphat01 commented 3 months ago

The problem really lies in xgboost's implementation as you mentioned. Setting the OMP_NUM_THREADS environment variable solved my problem. Thank you very much.

By the way, the nthread xgboost config cannot be set with xgb.set_config.

CloseChoice commented 3 months ago

Would you mind sharing the code to set the config? Would be great for future reference

hungngocphat01 commented 3 months ago

I just simply ran export OMP_NUM_THREADS=24 in the shell before executing the python script used to calculate shap scores.