h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.9k stars 2k forks source link

Add an H2OFrame.astype() method to Python #11560

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Similar to how Pandas supports a generic [astype()|http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.astype.html] method, we should as well. Although we have [asnumeric()|http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/_modules/h2o/frame.html#H2OFrame.asnumeric] and [asfactor()|http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/_modules/h2o/frame.html#H2OFrame.asfactor] methods already, it's best to have a generic one that works for all the types. If applied to a whole frame, it should attempt to convert all the columns (which is consistent to what asnumeric() and asfactor() does).

{code} import h2o h2o.init()

df = h2o.H2OFrame([[1,2],[3,4]]) df[['C1']].astype("enum") {code}

The list of allowable types should include all h2o frame/column types as well as an alias to what they're called in Pandas (e.g. "object" and "category" should both map to "enum" in H2O -- or maybe "object" maps to "string" and "category" maps to "enum").

hasithjp commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-4681 Assignee: Navdeep Gill Reporter: Erin LeDell State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A