kathrinse / TabSurvey

Experiments on Tabular Data Models
MIT License
268 stars 61 forks source link

n_trials parameter not obeyed #10

Closed parsifal9 closed 2 years ago

parsifal9 commented 2 years ago

Hi Kathrin

for one of my data sets I am setting the parameters as.

Namespace(config='config/wheat_anthesis.yml', model_name='XGBoost', dataset='wheat_anthesis', objective='regression', 
use_gpu=False, gpu_ids=[0, 1, 2, 3], data_parallel=True, optimize_hyperparameters=True, n_trials=20, direction='minimize', 
num_splits=5, shuffle=True, seed=221, scale=True, target_encode=False, one_hot_encode=False, batch_size=251, 
val_batch_size=251, early_stopping_rounds=20, epochs=500, logging_period=100, num_features=44567, num_classes=1, 
cat_idx=None, cat_dims=None)

This is a very time consuming process so I have n_trials=20. However the process runs out of time and gives this output for XGBoost

 Trial 40 finished with value: 123.61018808245896 and parameters: {'max_depth': 3, 'alpha': 1.0864626779404396e-06, 'lambda': 
0.022185113413679115, 'eta': 0.08421066233419264}. Best is trial 29 with value: 114.67435838448253.

Why does it get to a "Trial 40" when I set n_trials=20? Do these refer to different things?

bye R oops -- title should be "n_trials", not "n_trails" Sounds like I was thinking of "entrails" R

kathrinse commented 2 years ago

Hey, did you run it multiple times?

I configured Optuna (the library used for the hyperparameter optimization) in a way that it saves the state of the trials. If you re-run it with n_trials=20 it will run 20 additional trials - counting from the last saved trial number.

You can remove the saved states by deleting the database file. In your case it should be called XGBoost_wheat_anthesis.db.

parsifal9 commented 2 years ago

Hi Kathrin, that is great. I was thinking that a restart facility would be very useful, and there was one there the whole time. Thanks R