h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.91k stars 2k forks source link

Java: Memory Leak in Jobs #14169

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Jobs are holding on to references to data, as witnessed in Parse.

After parsing a large 7million row, 4501 column dataset and removing it, the jobs are still holding on to about 1/2 gb of memory under water.parser.ParseDataset$MultiFileParseTask. This leak is likely to be sprinkled in across all jobs in H2O: need to reevaluate how jobs are set up in H2O.

exalate-issue-sync[bot] commented 1 year ago

Cliff Click commented: Need to split Job from ModelBuilder - so that ModelBuilder "has-a" Job and not "is-a" Job. Job objects should remain tiny; just state & start/stop timing info (and maybe a final exception object for crashes).

Grid and CrossVal can use the same Job object for all sub-models being built, which then also shares the progress-bar and cancel notification.

Job needs a "CANCEL_PENDING" state. MRTask should allow a Job, which then auto-aborts if the Job fails.

Cliff

DinukaH2O commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-1191 Assignee: Cliff Click Reporter: Amy Wang State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A