Open ledell opened 1 year ago
I'm not sure if that's a problem - max_models
is used to limit the maximum number of models. Should we always end up with max_models
models? I'm not even sure if we count SEs in the model count. I think max_models
should be interpreted more like the maximum models that AutoML tries to train - imagine situation that we set max_models
to some non-zero number and you also set some parameters that will cause each model to fail to train.
We have some failsafe mechanism that looks at successive failures and if there's more than N failures it will stop AutoML. But still we would not end up with the max_models
models.
Also the question is if AutoML should care why the model failed - did xgboost fail because of processor architecture or because it handles data differently and fails when the response contains NA (I'm not sure if that's still happening but it should suffice for the example).
Yeah, it is us who decide what "max models" means after all... :-) You're right that it's a maximum, however, I was thinking that it should know that XGBoost is not available on certain systems and not count that towards the total models it tries (skipping a framework is slightly different than a regular failed model IMO). This way, if you set the same max models number on different systems (one that supports XGB, one that does not), you would still get the same number of total models at the end.
Let's also discuss with @sebhrusen later to get his input, since this is not urgent.
If I set
max_models = 15
, and run on my M1 macbook (which skips XGBoost), it will only return 11 models total, which includes 2 stacked ensembles. It seems to get the count wrong when it skips over XGBoost. Not sure if it can be replicated on a "normal" machine by skipping XGBoost withexclude_algos
but that's another thing to check to make sure it's working properly.