H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
As data set sizes grow, we need a format that supports both sparsity, different column types (SVMLight only supports floats), and allows parallel parsing.
The other option is plain CSV plus general support for Snappy compression (PUBDEV-63).
As data set sizes grow, we need a format that supports both sparsity, different column types (SVMLight only supports floats), and allows parallel parsing.
The other option is plain CSV plus general support for Snappy compression (PUBDEV-63).