h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.79k stars 1.99k forks source link

add an optional test dataset to hyperparameter search #9811

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

The way we do grid search right now we use the same dataset (either xval or a validation set) to do early stopping on the models as we do to do model selection. This is overfitting the hyperparameters.

Add an optional test set (third dataset / second holdout) for model selection.

[6:08 PM] So, I’m writing about hyperparameter search.

It feels to me like using the same cross-validation metrics as you use during training to choose your hyperparameters is overfitting the hyperparameters, and that the model should actually be chosen based on scoring a holdout.

But I’m tired and my brain is a bit fuzzy.

Thoughts? [6:08 PM] erin i’d recommend using a train, valid and test set [6:08 PM] and passing along the valid set [6:09 PM] actually im writing this code right now for chicago [6:09 PM] rpeck Yeah, 3-way split. [6:09 PM] erin that way you wont overfit [6:09 PM] rpeck That’s exactly what the ghost of Andrew Ng has been whispering in my ear. [6:09 PM] erin yeah its amazing how many times i trick myself into overfitting [6:09 PM] i did it on a notebook i made recently [6:11 PM] rpeck I’m glad the thought was not a complete hallucination. :simple_smile:

So we should be able to pass an optional test set to the grid search and have it score on that.

We don’t have that ability now. [6:11 PM] erin that’s true, i was just thinking that [6:12 PM] or maybe better… h2o.grid_performance [6:12 PM] or something like that [6:12 PM] ehh.. maybe your [6:12 PM] youre right [6:12 PM] i dont want to confuse people though, this is a good topic for #api-updates [6:12 PM] rpeck Yeah.

Here’s my text:

During the process of tuning the hyperparameters you need to avoid overfitting them to your training data. Otherwise, the hyperparameter values that you choose will be too highly tuned to your training data, and will not generalize well to new data. Note that this is the same principle, but subtly different from, overfitting during model training: ideally you should use a holdout (validation) dataset for model selection. [6:13 PM] erin yes exactly [6:14 PM] rpeck Yay! I understand something! [6:14 PM] erin maybe ill send you my grid search tutorial after im done today and we can compare notes [6:14 PM] to double check ourselves [6:14 PM] rpeck OH [6:15 PM] erin the caveat to what im doing in my code / your paragraph is that DL uses the validation set by default in training [6:15 PM] rpeck Double yay! Except that I didn’t realize we were both writing about the same topic. . . [6:15 PM] Ya. [6:15 PM] erin but within a single grid, if you are doing all DL [6:15 PM] and they are all using the validation set for training [6:15 PM] rpeck I’m writing that the user should do something that we don’t easily support: they have to kind of roll their own. [6:15 PM] erin then they are all overfitting [6:15 PM] rpeck Ya. [6:15 PM] erin so maybe it doesnt really amtter [6:16 PM] rpeck Except that I’ll add “test set” to grid search, so in a week or two we’ll have this problem solved. [6:17 PM] erin cool,…newdata or testing_frame? [6:17 PM] will grid search have a place to store test set metrics similar to the validation metrics? [6:18 PM] rpeck It would have to, yes. [6:18 PM] erin so when we h2o.getGrid, we can choose which one we want [6:18 PM] ok cool [6:18 PM] rpeck I’d return both, and if the user doesn’t specify a test set I’ll print out a message saying they are an idiot and return null for the test metrics. :wink: [6:18 PM] Updated text:

During the process of tuning the hyperparameters you need to avoid overfitting them to your training data. Otherwise, the hyperparameter values that you choose will be too highly tuned to your training data, and will not generalize well to new data. Note that this is the same principle, but subtly different from, overfitting during model training: ideally you should use a holdout test (validation) dataset for model selection, on top of cross-validation) or a validation set during training. [6:19 PM] When will you be in this week? [6:26 PM] erin so now im wondering, why do people do CV at all during model selection [6:26 PM] rpeck They get more stable metrics, for one thing. [6:26 PM] erin oh i see what you are saying, so choose the best model based on validation set, but report CV metrics [6:26 PM] because they are a better estimate 6:27 PM] rpeck Exactly. [6:27 PM] erin ok id make that a bit more clear [6:29 PM] rpeck :thumbsup:

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-2881 Assignee: Raymond Peck Reporter: Raymond Peck State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A