h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.94k stars 2k forks source link

DRF checkpoint with crossvalidation fails with error "ERRR: _weights_column: Weights column '__internal_cv_weights__' not found in the training frame" #8155

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Tried on 28.0.2 and latest 30.0.1 versions.

Create first DRF:

{noformat}rf1 <- h2o.randomForest( model_id="first_drf1_x1", x = f2, y = r1, training_frame = train1, validation_frame = valid1, ntrees = 49, nfolds = 5, seed = 1 ){noformat}

Train it and they try to continue training from this model like this:

{noformat}rf2 <- h2o.randomForest( model_id="second_drf1_x2", x = f2, y = r1, training_frame = train2, validation_frame = valid2, ntrees = (49+50), nfolds = 5, checkpoint = "first_drf1_x1", seed = 1

){noformat}

Immediately in logs this can be seen:

{noformat}POST /3/ModelBuilders/drf, parms: {model_id=second_drf1_x2, validation_frame=RTMP_sid_aea1_16, response_column=pcs7_e_dt_4010u, training_frame=RTMP_sid_aea1_14, seed=1, nfolds=5, ntrees=99, ignored_columns=["ts","leve_batch_nbr"], checkpoint=first_drf1_x1} 04-30 10:20:34.601 127.0.0.1:54321 55804 FJ-1-5 INFO: Creating 5 cross-validation splits with random number seed: 1 04-30 10:20:34.612 127.0.0.1:54321 55804 FJ-1-5 ERRR: _weights_column: Weights column 'internal_cv_weights' not found in the training frame {noformat}

When the first model created, there are 5 CV models created and they have that internal field set like this:

{noformat}“_weights_column":"internal_cv_weights",{noformat}

but when main first model is trained then :

{noformat}Building main model. ... “_weights_column":null,{noformat}

If nfolds set to 0 disabling cross validation everything works just fine

Full log is attached.

exalate-issue-sync[bot] commented 1 year ago

bb bb commented: Sorry, first draft was a mistake, corrected.

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-7483 Assignee: New H2O Bugs Reporter: bb bb State: Open Fix Version: N/A Attachments: Available (Count: 1) Development PRs: N/A

Attachments From Jira

Attachment Name: h2o_127.0.0.1_54321-3-info.log Attached By: bb bb File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-7483/h2o_127.0.0.1_54321-3-info.log