h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.91k stars 2k forks source link

Support for Polars DataFrames in AutoML #15825

Open bkowshik opened 1 year ago

bkowshik commented 1 year ago

Thank you for the H2O library, has definetly inspired me, I am more productive & effective with H2O. 🙇‍♂️


# NOTE: sample_df is a Polars DataFrame.
aml = H2OAutoML(max_models=3, seed=77);
aml.train(x=features, y=target, training_frame=h2o.H2OFrame(sample_df));
---------------------------------------------------------------------------
H2OTypeError                              Traceback (most recent call last)
Cell In[74], line 3
      1 aml = H2OAutoML(max_models=3, seed=77);
      2 # Finally converting to a Pandas DataFrame since H2O does not support Polars DataFrame.
----> 3 aml.train(x=features, y=target, training_frame=h2o.H2OFrame(sample_df));

File /opt/conda/lib/python3.10/site-packages/h2o/frame.py:97, in H2OFrame.__init__(self, python_obj, destination_frame, header, separator, column_names, column_types, na_strings, skipped_columns)
     92 def __init__(self, python_obj=None, destination_frame=None, header=0, separator=",",
     93              column_names=None, column_types=None, na_strings=None, skipped_columns=None):
     95     coltype = U(None, "unknown", "uuid", "string", "float", "real", "double", "int", "long", "numeric",
     96                 "categorical", "factor", "enum", "time")
---> 97     assert_is_type(python_obj, None, list, tuple, dict, numpy_ndarray, pandas_dataframe, scipy_sparse)
     98     assert_is_type(destination_frame, None, str)
     99     assert_is_type(header, -1, 0, 1)

File /opt/conda/lib/python3.10/site-packages/h2o/utils/typechecks.py:444, in assert_is_type(var, *types, **kwargs)
    442 etn = _get_type_name(expected_type, dump=", ".join(args[1:]))
    443 vtn = _get_type_name(type(var))
--> 444 raise H2OTypeError(var_name=vname, var_value=var, var_type_name=vtn, exp_type_name=etn, message=message,
    445                    skip_frames=skip_frames)

H2OTypeError: Argument `python_obj` should be a None | list | tuple | dict | numpy.ndarray | pandas.DataFrame | scipy.sparse.issparse, got DataFrame shape: (30_411, 6)

For now, I converted the Polars DataFrame to a Pandas DataFrame and went ahead.

aml = H2OAutoML(max_models=3, seed=77);
aml.train(x=features, y=target, training_frame=h2o.H2OFrame(sample_df.to_pandas()));
wendycwong commented 1 year ago

Work around exists by converting polars frame to pandas frame.