Open exalate-issue-sync[bot] opened 1 year ago
Erin LeDell commented: There is an older ticket (linked above) to allow access to (or rather, to generate.. since we don’t store this) the one-hot encoded version of a dataset. This is not an AutoML specific issue – this is relevant to any H2O model that uses one-hot encoding. This method could probably just be a H2OFrame utility function – given a frame, create a one-hot encoded version (the same as what will be created on the fly inside an H2O algo).
JIRA Issue Migration Info
Jira Issue: PUBDEV-6772 Assignee: UNASSIGNED Reporter: Zak Raicik State: Open Fix Version: N/A Attachments: Available (Count: 1) Development PRs: N/A
Attachments From Jira
Attachment Name: Dummy Example - Shap problem.py Attached By: Zak Raicik File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-6772/Dummy Example - Shap problem.py
[^Dummy Example - Shap problem.py]
AutoML uses default categorical encodings which vary according to the model. For some models, features are expanded into multiple columns (one hot encoding). However, these 'new' columns are created on the fly and cannot be accessed by the user.
This becomes a problem when working with SHAP values. If you try to match feature values to SHAP values, they will not align if any columns have been encoded (see attached example). Instead, the expanded frame is output in the TreeSHAP function.
Worked with @michalkurka on Gitter. He understands the issue.