H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
We would like to have a mechanism/tool that would be able to estimate how big should H2O cluster be for a given dataset. This can be a wizard that asks what kind of algos will the user use (automl/gbm,...) and ask for the input dataset/datasets. Based on this input and environment restrictions (max memory per node/container) we will provide a guidance how to configure the cluster.
We would like to have a mechanism/tool that would be able to estimate how big should H2O cluster be for a given dataset. This can be a wizard that asks what kind of algos will the user use (automl/gbm,...) and ask for the input dataset/datasets. Based on this input and environment restrictions (max memory per node/container) we will provide a guidance how to configure the cluster.