Request to help us to run coxph model for large data set with 300 columns( 6 GB )

h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

http://h2o.ai

Apache License 2.0

6.94k stars 2k forks source link

Request to help us to run coxph model for large data set with 300 columns( 6 GB ) #8532

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

We are trying to run coxph model using h2o,Rsparkling for large data set with 6 GB with 300 columns, whatever the configuration we take for spark, we are getting memory issues.

As per h2o, we should only have 4 times data size bigger cluster, but we took even 128GB 4 worker nodes with a 128 master node. But still its raising issues.

Please help us to choose the spark configuration needed to run h2o with our current data set

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-7108 Assignee: New H2O Bugs Reporter: Divya Mereddy State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A