H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Users find it difficult to estimate H2O cluster size for binary file formats like Parquet.
Our guidance is to allocate 4-5x memory of the "raw" size of the dataset. This is confusing to users of highly compressed files like Parquet.