benchmarking the difference in memory usage and training computation time for sparse vs. dense train data

h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Apache License 2.0

6.88k stars 1.99k forks source link

I was wondering if you could benchmark the training computation time of h2o glm models (logistic regression classifier) for sparse (lots of zeros) vs. dense input matrix? There is big difference in the training time for sparse vs. dense if I use the sparse version of the same training input matrix for R package "glmnet" . I was wondering if the same applied to the h2o glm trainer. I expect the memory usage for sparse to be much lower than dense. I know an input sparse matrix is possible: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/faq/data.html

h2oai / h2o-3

benchmarking the difference in memory usage and training computation time for sparse vs. dense train data #9396