h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.79k stars 1.99k forks source link

AutoML: Access Expanded DataFrame #8861

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

[^Dummy Example - Shap problem.py]

AutoML uses default categorical encodings which vary according to the model. For some models, features are expanded into multiple columns (one hot encoding). However, these 'new' columns are created on the fly and cannot be accessed by the user.

This becomes a problem when working with SHAP values. If you try to match feature values to SHAP values, they will not align if any columns have been encoded (see attached example). Instead, the expanded frame is output in the TreeSHAP function.

Worked with @michalkurka on Gitter. He understands the issue.

exalate-issue-sync[bot] commented 1 year ago

Erin LeDell commented: There is an older ticket (linked above) to allow access to (or rather, to generate.. since we don’t store this) the one-hot encoded version of a dataset. This is not an AutoML specific issue – this is relevant to any H2O model that uses one-hot encoding. This method could probably just be a H2OFrame utility function – given a frame, create a one-hot encoded version (the same as what will be created on the fly inside an H2O algo).

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-6772 Assignee: UNASSIGNED Reporter: Zak Raicik State: Open Fix Version: N/A Attachments: Available (Count: 1) Development PRs: N/A

Attachments From Jira

Attachment Name: Dummy Example - Shap problem.py Attached By: Zak Raicik File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-6772/Dummy Example - Shap problem.py