Open exalate-issue-sync[bot] opened 1 year ago
Sebastien Poirier commented: [~accountid:5cc0b0886fbf5a10040d2945] can you please tell which version you’re using as we did some changes in the last couple of major releases related to this.
Note that documentation from [https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html#required-stopping-parameters|https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html#required-stopping-parameters|smart-link] also says:
{noformat}When both options are set, then the AutoML run will stop as soon as it hits either of these limits.{noformat}
this is what happens in your examples above, you’re setting both {{max_runtime_secs}} and {{max_models}}.
To give you a better insights, I’ll explain roughly how this works:
{quote}Some production code used for training models broke when no Stacked Ensembles were trained{quote}
You should not expect that you will always have SEs. The training of SEs could raise some error (for one reason or another) even if several base models were trained, and then the trained AutoML will still behave normally. Also, as mentioned, if we were not able to train at least 2 base models in the given time budget, then we can’t train any SE.
Kunal Mishra commented: Hey Sebastien,
{quote}Some production code used for training models broke when no Stacked Ensembles were trained{quote}
As for next steps, I think it’d probably make sense to test a dev version on your end with the reprexes above to see if the issue persists (the last bullet explains what I’ve come to expect from H2O’s past versions and is behavior that I think would make sense even if both args are specified). If the issue can be reproduced, there might be some edge case logic to build that ensures building 2 SEs is at least attempted following the expiration of the max_runtime_secs budget (in H2O’s unit testing, I’d then {{assert}} that all {{aml}} objects in the reprex should have an SE as long as they have > 2 base models for future releases).
Sebastien Poirier commented: OK, I think I get it: you expect the combined {{max_models}}+{{max_runtimes_secs}} to work as it used to until {{3.32.1.x}} included, in the sense that even if the {{max_runtime_secs}} expired before {{max_models}} models have been trained, then AutoML should train 2 SEs on top, regardless how long it takes, and therefore making the {{max_runtime_secs}} harder to understand for common user.
The problem here is that the old behaviour regarding SEs, although partly consistent with old documentation, was inconsistent regarding the {{max_runtime_secs}} semantic/expectations, and you usually don’t fix an inconsistency by keeping it only in some cases.
[~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] what do you think of this use-case?
I see 4 ways to handle this:
Personally, I’m not a fan at all of #2: it falls back to past confusions and reinforces the distinction between “base models” and SEs, defining H2O-AutoML as a tool whose main goal is to produce SE models (instead of producing the most accurate, interpretable, fair… depending on the use-case). I also find that #3 adds unnecessary complexity for almost no benefit.
To be honest, I still struggle to understand users expectations when using both parameters. If reproducibility is important, {{max_models}} should be used, otherwise I’d avoid using it and just specify a time budget. If on top of this, the total runtime must be capped to avoid waste of resources, then I don’t see why by default AutoML should keep training after this cap because we still have some SEs to train (for how long? no one knows!).
Thoughts?
Kunal Mishra commented: So yes, I agree that our expectations given the previous version of H2O (3.32.x) didn’t quite fulfill here. However, the reason they did not, regardless of the rest of this debate, still feels like a bug where it’s unclear why the SEs are not being trained at any point during the reprex. Even when ample time and max models are specified, they’re still not being trained, which feels like a flaw with the new logic somewhere worth investigating.
I also agree for the most part with your evaluation of the options. An option to force SE training (enabled by default?) is probably the easiest and most commonsense option moving forward. I think one possible complication that doesn’t affect my use case and thinking is the possibility of specifying non GLM metalearners in the AutoML call (otherwise deciding how much time to “save” for training SEs at the end would be relatively simple… right? GLMs train nearly instantaneously in non-resource-starvation scenarios).
And then user expectation when using both parameters on our end at least is to use the budgeted time to train 50 “deeper” models rather than potentially hundreds of “shallower” models with the significant resources being thrown at the problem, at least when the AutoML was initially being specified for this problem with a 3.32.x version. We always use the Best of Family Stacked Ensembles outputted as MOJOs in production (using an SE with all models on the leaderboard was too inefficient at time of prediction so we use the lighter SE). Then the incremental benefit of a suite of individually better models with more time and resources poured into each of them was higher this way then training a huge variety of lighter, shallower models and taking the best ~6 of them. And the reason we specified a max runtime seconds (again at that time – I haven’t verified this assumption recently) was that training didn’t seem to complete ever, which I know is probably modifiable behavior via early stopping behavior or metrics but max runtime secs is by far the easier to use, manipulate, talk about, and document for future use.
Kunal Mishra commented: It… looks like this was fixed in more recent version of H2O? When specifying max_models of 5, for example, in a few minutes it looks like 5 base models were trained and then 2 stacked ensembles (an all & best of family) before the completion of the automl call, which is the behavior we had expected.
Increasing max_runtime_secs and keeping max_models the same led to the desired behavior of “deeper” individual models with more compute time per model while still retaining the SEs, which we are happy about.
JIRA Issue Details
Jira Issue: PUBDEV-8844 Assignee: Sebastien Poirier Reporter: Kunal Mishra State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A
Hey there,
Some production code used for training models broke when no Stacked Ensembles were trained during several AutoML runs using the latest R version of H2O. Upon further inspection, we were able to reproduce the issue with the following code… it appears that specifying {{max_models}} creates a situation that contradicts documentation indicating Stacked Ensembles are always trained as part of AutoML, so reporting that with a reprex here:
{code:r}library(tidyverse) library(h2o) data(iris)
h2o.init()
iris_df <- iris %>% as_tibble() iris_df_h2o <- iris_df %>% as.h2o()
Stacked Ensemble does generate
aml <- h2o.automl( y = 'Species', training_frame = iris_df_h2o, max_runtime_secs = 60 )
Stacked Ensemble does not generate
aml2 <- h2o.automl( y = 'Species', training_frame = iris_df_h2o, max_runtime_secs = 60, max_models = 50, seed = 1, exploitation_ratio = .05 )
Stacked Ensemble does not generate
aml3 <- h2o.automl( y = 'Species', training_frame = iris_df_h2o, max_runtime_secs = 60, max_models = 50, seed = 1#,
exploitation_ratio = .05
Stacked Ensemble does not generate
aml4 <- h2o.automl( y = 'Species', training_frame = iris_df_h2o, max_runtime_secs = 60, max_models = 50#,
seed = 1,
Stacked Ensemble DOES generate
aml5 <- h2o.automl( y = 'Species', training_frame = iris_df_h2o, max_runtime_secs = 60,
max_models = 50,