H2OGridSearch.train() should default to all columns, like H2OEstimator.train() and H2OAutoML.train()

h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Apache License 2.0

6.89k stars 2k forks source link

Our default behavior for all estimators (as well as AutoML) is that if x is not specified, it will assume x = "all columns in the training_frame other than y". e.g.

{code:python} rf = H2ORandomForestEstimator(model_id="rf", ntrees=200) rf.train(y=y_col, training_frame=train_hex, validation_frame=valid_hex) {code}

However, H2OGridSearch explicitly requires x. This should be changed for consistency and usability.

{code:python} gs1 = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial', nfolds=2, fold_assignment="modulo", keep_cross_validation_predictions=True), hyper_parameters, search_criteria=criteria)

gs1.train(y=y_col, training_frame=train_hex, validation_frame=valid_hex)

----> 7 gs1.train(y=y_col, training_frame=train_hex, validation_frame=valid_hex) 8 auc_glm = gs1.auc(valid=True)

TypeError: train() takes at least 2 arguments (4 given) {code}

h2oai / h2o-3

H2OGridSearch.train() should default to all columns, like H2OEstimator.train() and H2OAutoML.train() #12144