mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3.03k stars 404 forks source link

Optuna mode gives the following error #397

Closed ijeffking closed 3 years ago

ijeffking commented 3 years ago

After running:

# training in Optuna mode, highly tune selected algorithm

automl = AutoML(
    mode="Optuna", 
    algorithms=["CatBoost", "LightGBM", "Xgboost"],
    optuna_time_budget=3*3600,
    eval_metric="rmse"
)
automl.fit(X_train, y_train)

I got

2021-05-17 11:18:05,397 supervised.exceptions ERROR No models produced. 
Please check your data or submit a Github issue at https://github.com/mljar/mljar-supervised/issues/new.

1_Optuna_LightGBM not trained. Stop training after the first fold. Time needed to train on the first fold 30.0 seconds. The time estimate for training on all folds is larger than total_time_limit.
There was an error during 2_Optuna_Xgboost training.
Please check AutoML_2/errors.md for details.
There was an error during 3_Optuna_CatBoost training.
Please check AutoML_2/errors.md for details.

---------------------------------------------------------------------------

AutoMLException                           Traceback (most recent call last)

<ipython-input-14-e19533f68cf2> in <module>()
      6     eval_metric="rmse"
      7 )
----> 8 automl.fit(X_train, y_train)

2 frames

/usr/local/lib/python3.7/dist-packages/supervised/base_automl.py in _fit(self, X, y, sample_weight, cv)
   1047                     if len(self._models) == 0:
   1048                         raise AutoMLException(
-> 1049                             "No models produced. \nPlease check your data or"
   1050                             " submit a Github issue at https://github.com/mljar/mljar-supervised/issues/new."
   1051                         )

AutoMLException: No models produced. 
Please check your data or submit a Github issue at https://github.com/mljar/mljar-supervised/issues/new.
pplonski commented 3 years ago

@ijeffking thank you for reporting the issue. Two things to check:

  1. Can you run AutoML with small optuna_time_budget, for example optuna_time_budget=120. Is it working?
  2. Can you try to add one more parameter total_time_limit=24*3600 to AutoML() constructor?

The problem looks like some time limit for training ...

ijeffking commented 3 years ago

Hello Piotr,

Appreciate the prompt Response.

Yesssss it works like a charm. Although there is a remarkable improvement in the Performance using Optuna mode I suggest increasing the optuna_time_budget To get dramatic improvement in scores and Performance.

So kindly edit the Code on https://mljar.com/blog/next-generation-automl/ to include both optuna_time_budget and total_time_limit

Thank you again

Best, Jeff

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows 10

From: @.> Sent: Monday, May 17, 2021 1:38 PM To: @.> Cc: @.>; @.> Subject: Re: [mljar/mljar-supervised] Optuna mode gives the following error (#397)

@ijeffkinghttps://github.com/ijeffking thank you for reporting the issue. Two things to check:

  1. Can you run AutoML with small optuna_time_budget, for example optuna_time_budget=120. Is it working?
  2. Can you try to add one more parameter total_time_limit=24*3600 to AutoML() constructor?

The problem looks like some time limit for training ...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mljar/mljar-supervised/issues/397#issuecomment-842253118, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGIILVPO2FBZDURJFN5C5CTTOD5Z5ANCNFSM45AIBW7Q.

pplonski commented 3 years ago

@ijeffking thank you for the feedback, fixed the code example in the blog post. Updating the server now.

You can also try to enable golden features in AutoML during Optuna, this also very often gives nice improvements.

diggee commented 3 years ago

Hi @pplonski , I ran into the same error last night. I gave the following arguments to AutoML()

automl = AutoML(mode = 'Optuna', algorithms=["CatBoost", "Xgboost", "LightGBM"], 
                ml_task = 'multiclass_classification', optuna_time_budget = 7200, 
                features_selection = True, golden_features = True, 
                total_time_limit = 7200)
  1. According to the documentation here, each of the chosen 3 algorithms will be trained for 7200s and after that, each algorithm will be trained with a 10 fold CV using the optimal parameters. Is my understanding correct here?

  2. Further, the following error showed up

1_Optuna_LightGBM not trained. Stop training after the first fold. Time needed to train on the first fold 22.0 seconds. The time estimate for training on all folds is larger than total_time_limit.

So if 1 fold training time is 22s, time to train on 10 folds will be 220s for one algorithm. So time to train all 10 folds on 3 algorithms should be = 220*3 = 660s. I have given the total_time_limit as 7200s, then why does it throw this error?

  1. I also tried your suggestion of reducing optuna_time_budget to 60s as follows
    automl = AutoML(mode = 'Optuna', algorithms=["CatBoost", "Xgboost", "LightGBM"], 
                ml_task = 'multiclass_classification', optuna_time_budget = 60, 
                features_selection = True, golden_features = True, 
                total_time_limit = 7200)

This works although I cannot understand why it should given nothing else has changed in the function parameters except the optuna_time_budget. The training time on one fold will be the same, and the total_training_time is still the same, then why does this work?

pplonski commented 3 years ago

@diggee what version are you using? I'm pretty sure that I was fixing this bug. Please try with the current 0.10.4 version.

It should work as you are describing.

Optuna mode was added as the last mode, and all other modes are very restricted regarding the total_time_limit parameter. There was a bug and Optuna tuning time was used as training time. It should be fixed in https://github.com/mljar/mljar-supervised/issues/347 Please check with the latest version, otherwise it is still a bug.

@diggee do you get good models with MLJAR with other modes?

diggee commented 3 years ago

@pplonski Thanks for getting back so soon. I am using 0.10.4 version, I only installed mljar library day before :D

Maybe it is still a bug then. At the start of training, the times are printed out correctly. So if I use the following function call

automl = AutoML(mode = 'Optuna', algorithms=["CatBoost", "Xgboost", "LightGBM"], 
                ml_task = 'multiclass_classification', optuna_time_budget = 3600, 
                features_selection = True, golden_features = True, 
                total_time_limit = 7200)

I see the following output at the start, which is correct.

AutoML directory: AutoML_1
Expected computing time:
Total training time: Optuna + ML training = 18000 seconds
Total Optuna time: len(algorithms) * optuna_time_budget = 10800 seconds
Total ML model training time: 7200 seconds
The task is multiclass_classification with evaluation metric logloss
AutoML will use algorithms: ['CatBoost', 'Xgboost', 'LightGBM']
AutoML will stack models
AutoML will ensemble availabe models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'golden_features', 'insert_random_feature', 'features_selection', 'ensemble', 'stack', 'ensemble_stacked']
Skip simple_algorithms because no parameters were generated.
* Step default_algorithms will try to check up to 3 models
[I 2021-05-18 15:22:04,581] A new study created in memory with name: no-name-9bcd8efc-3ab5-440e-99b4-2326e47f6f7f
Optuna optimizes LightGBM with time budget 3600 seconds eval_metric logloss (minimize)
[I 2021-05-18 15:22:21,111] Trial 0 finished with value: 1.1061558713211852 and parameters: {'learning_rate': 0.1, 'num_leaves': 1598, 'lambda_l1': 2.840098794801191e-06, 'lambda_l2': 3.0773599420974e-06, 'feature_fraction': 0.8613105322932351, 'bagging_fraction': 0.970697557159987, 'bagging_freq': 7, 'min_data_in_leaf': 36, 'extra_trees': False}. Best is trial 0 with value: 1.1061558713211852.

Perhaps it is not implementing the time variables correctly?

I have not tried the other modes cos I specifically downloaded mljar for its automatic optimization capability using Optuna.

pplonski commented 3 years ago

@diggee thank you, so it is a bug.

Maybe it should be no time limit for training the final mode after Optuna tuning? Just limit for hyperparameters search with Optuna? @ijeffking @diggee what do you think?

diggee commented 3 years ago

Yea, I guess total time limit shouldn't really be a factor when you are prepared to spend a lot of time anyway to tune the hyper parameters. Also, the tuning time will be way higher compared to just running the algorithms with the tuned parameters, so specifying total_time_limit anyways will not be much different from optuna_time_budget.

BTW, this is some great work man, really appreciate the library :)

pplonski commented 3 years ago

@ijeffking @diggee I've added a fix that there is no time limit for model training after Optuna tuning. You just set the time limit for Optuna optimization (optuna_time_budget parameter). After Optuna tuning models are fully trained without interruptions.

The changes are in the master branch, please install them directly from GitHub:

pip install -q -U git+https://github.com/mljar/mljar-supervised.git@master
pplonski commented 3 years ago

Added fixes in tuner's code.