microsoft / FLAML

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
https://microsoft.github.io/FLAML/
MIT License
3.92k stars 510 forks source link

Learning with custom_hp in flaml does not limit the range of n_estimators in LightGBM? #1208

Closed zuttonetetai closed 1 year ago

zuttonetetai commented 1 year ago

I am trying to create a model with boston housing dataset with custom_hp set and lightgbm n_estimators restricted. I have specified lightgbm as the algorithm, and when I check the hyperparameters in automl.model.estimator.get_params, I confirm that they are different from the values in automl.best_config["n_estimators"]. (For num_leaves, the result of automl.model.estimator.get_params and the value in automl.best_config["num_leaves"] match.) Why is this problem occurring? Is it a bug?

# exsample

automl = AutoML()

settings = {
    "time_budget": 20,             # total running time in seconds
    "metric": 'rmse',               # metric
    "task": 'regression',           # task type
    "estimator_list": ['lgbm'],  # list of ML learners
    "log_file_name": 'automl.log',  # log file name
    "log_training_metric": False,    # whether to log training metric
    "seed": 1,                      # random seed
    "split_ratio": 0.1,
    "eval_method": "holdout",
    "retrain_full": False,        # whether to retrain the selected model on the full training data when using holdout
    "custom_hp": {
        "lgbm": {
            "n_estimators": {
                "domain": tune.randint(lower=100, upper=112),
                "init_value": 100
            },
            "num_leaves": {
                "domain": tune.randint(lower=30, upper=50),
                "init_value": 36
            },
        },
    } 
}

automl.fit(X_train=X_train, y_train=y_train, **settings)
print(automl.best_config["n_estimators"], automl.best_config["num_leaves"])
# 107 38

print(automl.model.estimator.get_params)
# <bound method LGBMModel.get_params of 
# GBMRegressor(learning_rate=0.10527927213108579, max_bin=255,
#               min_child_samples=37, n_estimators=1, n_jobs=-1, num_leaves=38,
#               reg_alpha=0.010525908112405348, 
# reg_lambda=0.0025820331500317215,
#              verbose=-1)>

print(flaml.__version__)  # 2.0.2
print(lightgbm.__version__)  # 4.0.0
sonichi commented 1 year ago

This is because of early stop in training to meet the time constraint.

zuttonetetai commented 1 year ago

@sonichi

Thank you for your response.

This is because of early stop in training to meet the time constraint.

Is it not possible to specify a range of values for n_estimators in flaml with time_budget set?

sonichi commented 1 year ago

You can specify it and the configuration is indeed sampled from the specified range. But early stop in training can happen to meet the time constraint.

zuttonetetai commented 1 year ago

@sonichi

You can specify it and the configuration is indeed sampled from the specified range. But early stop in training can happen to meet the time constraint.

I understood it. I have tried and confirmed that by increasing the time_budget, the n_estimators in automl.model.estimator.get_params and the values in automl.best_config["n_estimators"] match. Thank you very much. I close this isuue.