h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.87k stars 2k forks source link

GLRM Slow on Multinode due to Serialization #14507

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

GLRM exhibits slowdown in multi-node cluster runs. The suspected culprit is the amount of serialization required in each MRTask, particularly the DataInfo and GLRMParameters objects, which require a lot of communication between nodes. [~accountid:557058:e393304e-df0f-4e4f-a4bf-cb0cdf121b88] is currently testing this hypothesis.

When running on large data (e.g., BigCross), the slowdown due to network communication is offset by the speed of distributed computation, so that GLRM will still run faster in multi-node than single node as desired.

exalate-issue-sync[bot] commented 1 year ago

Former user commented: Partially fixed, since I don't pass the full DataInfo object, but all of GLRMParameters is still being serialized.

DinukaH2O commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-1535 Assignee: Former user Reporter: Former user State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A