h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.88k stars 1.99k forks source link

benchmarking the difference in memory usage and training computation time for sparse vs. dense train data #9396

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

I was wondering if you could benchmark the training computation time of h2o glm models (logistic regression classifier) for sparse (lots of zeros) vs. dense input matrix? There is big difference in the training time for sparse vs. dense if I use the sparse version of the same training input matrix for R package "glmnet" . I was wondering if the same applied to the h2o glm trainer. I expect the memory usage for sparse to be much lower than dense. I know an input sparse matrix is possible: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/faq/data.html

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-6224 Assignee: New H2O Bugs Reporter: Ehsan Jahangiri State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A