h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.89k stars 1.99k forks source link

h2o Xgboost: when using enum encoding variable importance still reports levels #8885

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

{code:java}

data = h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") data$survived = as.factor(data$survived) model = h2o.xgboost(x = 4:7,y = 2,training_frame = data,model_id = "model",ntrees = 50,categorical_encoding = "Enum") model@model$variable_importances Variable Importances: variable relative_importance scaled_importance percentage 1 sex.female 757.713684 1.000000 0.548491 2 age 391.087708 0.516142 0.283099 3 sibsp 145.486938 0.192008 0.105315 4 parch 87.162399 0.115033 0.063095 model = h2o.gbm(x = 4:7,y = 2,training_frame = data,model_id = "model",ntrees = 50,categorical_encoding = "Enum") |======================================================================================================================| 100% model@model$variable_importances Variable Importances: variable relative_importance scaled_importance percentage 1 sex 456.504883 1.000000 0.648417 2 age 137.592926 0.301405 0.195436 3 sibsp 77.152267 0.169006 0.109587 4 parch 32.779179 0.071805 0.046559

{code}

exalate-issue-sync[bot] commented 1 year ago

Lauren DiPerna commented: [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] if we decide to allow categorical levels like {{sex.female}} can we also make sure that {{.partial_plot()}} can function on these new one hot encoded columns? for example in this code snippet I can see the pdp of {{sex}} but not {{sex.female}}. thanks!

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-6748 Assignee: Michal Kurka Reporter: Nidhi Mehta State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A