h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.87k stars 1.99k forks source link

checkpoint: giving different train set metrics when run a 5+5t checkpoint model vs 10t model with same params #14795

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

on prostate data- When make a 5+5 trees model with checkpoint expect it to have the same scoring history as a 10trees model, when all other params(including seed) are kept constant the logloss/mse/auc looks quite different for the training set (screenshot attached)

buildModel 'drf', {"model_id":"drf-aeff3634-24f2-4fbb-9cfa-44f25558f4f0","training_frame":"Key_Frameprostate.hex","validation_frame":"Key_Frameprostate.hex","nfolds":0,"response_column":"CAPSULE","ignored_columns":[],"ignore_const_cols":true,"ntrees":"5","max_depth":20,"min_rows":1,"nbins":20,"nbins_cats":1024,"seed":-6804836485597913000,"mtries":-1,"sample_rate":0.632,"score_each_iteration":false,"balance_classes":false,"r2_stopping":0.999999,"build_tree_one_node":false,"binomial_double_trees":false,"checkpoint":""}

buildModel 'drf', {"model_id":"drf-5+5","training_frame":"Key_Frameprostate.hex","validation_frame":"Key_Frameprostate.hex","nfolds":0,"response_column":"CAPSULE","ignored_columns":[],"ignore_const_cols":true,"ntrees":"10","max_depth":20,"min_rows":1,"nbins":20,"nbins_cats":1024,"seed":-6804836485597913000,"mtries":-1,"sample_rate":0.632,"score_each_iteration":false,"balance_classes":false,"r2_stopping":0.999999,"build_tree_one_node":false,"binomial_double_trees":false,"checkpoint":"drf-aeff3634-24f2-4fbb-9cfa-44f25558f4f0"} ############################ buildModel 'drf', {"model_id":"drf10","training_frame":"Key_Frameprostate.hex","validation_frame":"Key_Frameprostate.hex","nfolds":0,"response_column":"CAPSULE","ignored_columns":[],"ignore_const_cols":true,"ntrees":"10","max_depth":20,"min_rows":1,"nbins":20,"nbins_cats":1024,"seed":-6804836485597913000,"mtries":-1,"sample_rate":0.632,"score_each_iteration":false,"balance_classes":false,"r2_stopping":0.999999,"build_tree_one_node":false,"binomial_double_trees":false,"checkpoint":""}

exalate-issue-sync[bot] commented 1 year ago

Michal Malohlava commented: OOB re-computation for DRF is broken now (that's the metrics reported for training data).

The recommendation is to use validation dataset for now.

DinukaH2O commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-1837 Assignee: Michal Malohlava Reporter: Nidhi Mehta State: Open Fix Version: N/A Attachments: Available (Count: 2) Development PRs: N/A

Attachments From Jira

Attachment Name: tree_10.png Attached By: Nidhi Mehta File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-1837/tree_10.png

Attachment Name: tree5+5.png Attached By: Nidhi Mehta File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-1837/tree5+5.png