H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Improvements suggested for XGBoost default models inside AutoML:
xgBoostParameters._max_depth should not be 20 (XGB def2). The deepest we should go on this is probably 15 or 13. For reference def1 is depth 10 and def3 is depth 5.
xgBoostParameters._min_rows looks small. xgBoostParameters._min_rows vaguely scales with the number of rows. For a tiny dataset (20 K rows) it might be 3-15. For a 150K row dataset it might be 20-60. For a 500k row dataset probably 100-300.
xgBoostParameters._sample_rate should probably be 0.8-1.0 by default. Frequently I end up with 1.0, so I might start with that. It's seldom as small as the 0.6.
xgBoostParameters._col_sample_rate and xgBoostParameters._col_sample_rate_per_tree are cumulative so if they are both 0.8, that's 0.64 fraction of the features at each split. I vaguely expect the cumulative value to be larger (0.8?) for a small number of columns (20?) and smaller (0.3?) for a larger number of columns(100?).
Improvements suggested for XGBoost default models inside AutoML: