microsoft / FLAML

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
https://microsoft.github.io/FLAML/
MIT License
3.76k stars 495 forks source link

n_estimators value on automl.model differs from value in logs (for CatBoost models) #1317

Open dannycg1996 opened 5 days ago

dannycg1996 commented 5 days ago

Hi all,

The n_estimators value on the best model (automl.model) provided by FLAML does not seem to be set correctly for CatBoostClassifiers.

Example code here:

from flaml import AutoML
from sklearn import datasets

dic_data = datasets.load_iris(as_frame=True)  # numpy arrays
iris_data = dic_data["frame"]  # pandas dataframe data + target
automl = AutoML()
automl_settings = {
    "max_iter":2,
    "metric": 'accuracy',
    "task": 'classification',
    "log_file_name": "catboost_error.log",
    "log_type": "all",
    "estimator_list": ['catboost'],
    "eval_method": "cv",
}
x_train = iris_data[["sepal length (cm)","sepal width (cm)", "petal length (cm)","petal width (cm)"]].to_numpy()
y_train = iris_data['target']
automl.fit(x_train, y_train, **automl_settings)
print(automl.model.get_params())

The print statement logs the following for me: {'early_stopping_rounds': 10, 'learning_rate': 0.09999999999999996, 'n_estimators': 33, 'thread_count': -1, 'verbose': False, 'random_seed': 10242048, 'task': <flaml.automl.task.generic_task.GenericTask object at 0x7f895f2b3830>, '_estimator_type': 'classifier'}

However, if I look into the actual [catboost_error.log], I can see that neither of the two estimators attempted had n_estimators = 33. They actually had n_estimators = 35 and n_estimators =57. Replicating the FLAML folds myself has shown that this n_estimators value should be 35, meaning that the logs are correct and automl.model is incorrect.

Furthermore, if I run print(automl.model.model.get_all_params()) I get a dictionary which includes iterations=35. The catboost documentation shows that iterations is an alias of n_estimators, and whilst I haven't managed to pin down the exact cause of this issue, I believe it's tied in somewhere here.

In terms of package versions, I'm using FLAML 2.1.2, catboost 1.2.5, scikit-learn 1.5.0 and Python 3.12.0

Programmer-RD-AI commented 5 days ago

Hi, I will check through with this in the future but check #1275 discussion as well, it seems that they have come across the same issue... I will try and see through with what the issue is :) If anyone else can contribute of help out please do, thnx