rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.16k stars 525 forks source link

[FEA] Unified Interface for Model Interpretability in cuML #5710

Open yihong1120 opened 8 months ago

yihong1120 commented 8 months ago

Is your feature request related to a problem? Please describe. While cuML provides a powerful suite of GPU-accelerated machine learning algorithms, there is a growing need for comprehensive model interpretability within the same GPU-optimised environment. Currently, practitioners who require interpretability must often revert to CPU-based tools or separate packages, which can disrupt the workflow and reduce the efficiency gains achieved by cuML.

Describe the solution you'd like I propose the development of a unified interface within cuML that offers model interpretability features, such as feature importance, partial dependence plots, and SHAP value computation. This interface should be compatible with the existing cuML models and maintain the library's GPU acceleration advantage. Ideally, it would mirror the simplicity of scikit-learn's interpretability tools, allowing users to seamlessly transition between model training and interpretation without leaving the GPU ecosystem.

Describe alternatives you've considered An alternative would be to continue using separate CPU-based tools for model interpretability, but this is suboptimal as it negates the performance benefits of cuML. Another option is to integrate third-party GPU-accelerated interpretability libraries, but this can lead to compatibility issues and a fragmented user experience.

Additional context The addition of interpretability features would solidify cuML's position as a comprehensive machine learning library and greatly benefit users who need to explain their models' predictions, especially in domains where understanding model decisions is critical.

dantegd commented 8 months ago

Thanks for the issue and proposal @yihong1120. This is something I've been thinking about for quite some time, and I think there is a lot of potential in general for GPU involvement here. cuML already supports SHAP values, but extending that to a more comprehensive interpretability offering is something we would like to look into.