h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.94k stars 2k forks source link

Add EBM #15728

Open wendycwong opened 1 year ago

wendycwong commented 1 year ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered, if applicable.

Additional context Add any other context or screenshots about the feature request here. If there's a reference (paper, book, etc) for this feature, please add that here.

H2O.ai Devs only If there is a support ticket associated with this issue, please post the link here.

IzuiT commented 1 year ago

EBM - Explainable Boosting Machine. Overview: https://interpret.ml/docs/ebm.html Paper: https://www.cs.cornell.edu/~yinlou/papers/lou-kdd12.pdf Video overview: https://www.youtube.com/watch?v=MREiHgHgl0k Implemented as a package: https://github.com/interpretml/interpret

First of all, why to re-implement if there is an open-source package available. B/c right now EBM is a part of a toolset package and I'm not sure how scalable it is. H2O-3 seems like a great place for it.

2nd, H2O-3 has a Web UI which could be used to edit model directly in the browser like shown in this package (https://github.com/interpretml/gam-changer), so user will have smooth user experience

3rd resulting model is extremely lightweight and easy to implement as a code and it can be in pure Java/C/whatever so it can be applied on edge devices

4th Due to interpretability it is easier to spot treatment effects in data or any type of data inconsistency with expert knowledge and correct it (see p. 2) that could be a game changer for Healthcare

paulbkoch commented 1 year ago

Hi @IzuiT and @wendycwong -- Not sure if this resolves all of your concerns, but we do publish a zero-dependency package called interpret-core with the intent of exposing EBMs without the rest of the interpret toolset. More info at: https://interpret.ml/docs/deployment-guide.html

https://pypi.org/project/interpret-core/

wendycwong commented 1 year ago

Ambitious for 3.46.0.1. Probably deliver in 3.48.0.1

wendycwong commented 1 year ago

Please break this into multiple issues if needed. Nobody wants to do a code review of 50 files! LOL.