Open leelew opened 1 year ago
By default, "r2" is used as the optimization metric for regression tasks. Looking at your plots, the model doesn't overfit the r2 or KGE metric. The model overfits RMSE. If you'd like to use RMSE as the optimize metric, please set metric="rmse"
.
Hi Chi,
Thanks for your reply.
I think our model not only overfit RMSE, but also R2 and KGE (i.e., the performance on training data is much better than on test data). We will try to set metric=rmse
, and set split_ratio=0.2
.
The code is shown as:
automl.fit(x_train, y_train, task = 'regression', metric = 'rmse', split_ratio=0.2, ensemble={ 'final_estimator': MLPRegressor(), 'passthrough': True }, time_budget=3600)
We will further contact you if this did not work. Thanks again for your help!
Best, Lu Li
Hi Chi,
We set metric=rmse
and used holdout strategy (split_ratio=0.2
). However, we also found overfitting problem. Although we found AutoML could perform better than other ML models on test data, but the train performance of AutoML is much better than test performance.
Is there any further suggestion to avoid overfitting when using AutoML?
Best, Lu
The code is:
automl.fit(x_train, y_train, task = 'regression', metric = 'rmse', split_ratio=0.2, ensemble={ 'final_estimator': LGBMRegressor(), 'passthrough': True }, time_budget=3600)
The train performance is:
The test performance is:
Hi,
We used FLAML to perform regression task, and found AutoML model was easy to be overfitted. However, in the same task, other ML models e.g., LightGBM, RF, could avoid overfitting by grid search best parameters. We tried add 'cv=5' into the AutoML model, but it did not work on our case.
So could you give me some suggestions on how to avoid overfitting when using FLAML AutoML models?
BTW: We also used
flame.default.LGBMRegressor()
to perform auto-search hyper-parameters of LightGBM model, but this model is still overfitting. But LightGBM model could be avoid overfitting by grid search methods. So I think maybe I misuse FLAML.Lu Li
The code of FLAML AutoML models:
from flaml import AutoML
am = AutoML()
am.fit(x_train, y_train, task="regression")
The performance on training data:
The performance on test data: