h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.88k stars 1.99k forks source link

Expose one-hot encoding to H2OFrame operations #10848

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Requesting a method on a H2ODataframe with one categorical column that outputs a multi column dataframe (one column for each unique category) and a 0/1 value for that row.

exalate-issue-sync[bot] commented 1 year ago

Vlad Patryshev commented: It's in branch vlad_PUBDEV_3955, waiting for Michal's approval.

exalate-issue-sync[bot] commented 1 year ago

Michal Malohlava commented: It will be closed, when issue hit the master.

exalate-issue-sync[bot] commented 1 year ago

Erin LeDell commented: Let’s revisit this. Lots of interest on Stack Overflow, especially by people using interpretability methods.

exalate-issue-sync[bot] commented 1 year ago

Erin LeDell commented: [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] just pointed out that R has a (private, non-exported) function which already does this FYI {{.getExpanded}}: [https://github.com/h2oai/h2o-3/blob/master/h2o-r/h2o-package/R/frame.R#L312|https://github.com/h2oai/h2o-3/blob/master/h2o-r/h2o-package/R/frame.R#L312|smart-link]

exalate-issue-sync[bot] commented 1 year ago

Michal Kurka commented: [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] this function is GLM-specific - it might produce a different order of columns than how xgboost sees the frame - which might be important in some cases

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-3955 Assignee: Vlad Patryshev Reporter: Mark Chan State: Reopened Fix Version: N/A Attachments: N/A Development PRs: Available

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/898 https://github.com/h2oai/h2o-3/pull/957