keras-team / keras-tuner

A Hyperparameter Tuning Library for Keras
https://keras.io/keras_tuner/
Apache License 2.0
2.86k stars 396 forks source link

Tuner's Oracle paratemer "run_times" differs from user input "executions_per_trial" #1022

Open jsaladich opened 3 months ago

jsaladich commented 3 months ago

Hi KerasTuner team!

Describe the bug I ran an experiment with keras_tuner.BayesianOptimization in which executions_per_trial=3. When I check the file ./oracle.json I realize that the field run_times is always equal to 1.

Moreover, the files ./../trial.json of each trial only return 1 best score and a single value in metric.

Expected behavior I would expect two things to behave differently:

  1. oracle.json file should return each trial with run_times=3 if the user requested executions_per_trial=3 in the configuration
  2. Each trial.json file should return a list of lenght len(executions_per_trial) containing the scores / metrics for each execution per trial, so the user can analyze better the algorithm.

Am I missing something or this is how it works? Thanks!

ghsanti commented 2 months ago

Hi @jsaladich,

  1. run_times increases if a trial fails. The oracle does this: self._run_times[trial.trial_id] += 1 when it fails.. Maybe it should be called retries. Any ideas here?

  2. For multiple executions of a trial, the values are averaged..

The current code returns (i added comments):

Open JSON ```python { "trial_id": "00", "metrics": { # <--- note that this key is duplicated, is it a bug? "metrics": { # <--- Im including loss only, but all look the same. "loss": { "direction": "min", "observations": [ #<-- before it was a single object { "value": [ 2.3045783042907715 ], "step": 0 }, # (...) { "value": [ 2.3027408123016357 ], "step": 2 } ] }, }, # (...) } ```
jsaladich commented 2 months ago

Hi @ghsanti thanks a lot for your exhaustive response, and sorry for the delay in the response.

Before answering you I need to know, the json you just posted is showing loss metric per step (i.e. epochs). That is great, but as far as I remember (haven't used KT much recently) the KT engine selects the best step.

My concern is about the executions_per_trial, I need to know the best_score_value for each execution per trial.

Of course, having full traceability (i.e. metrics per each step and per each execution) would be the best.

Please, let me know if we are in the same page

Thanks a lot!

ghsanti commented 2 months ago

@jsaladich

Working on PR (see below), feel free to comment.

jsaladich commented 2 months ago

Hi @ghsanti amazing job and sorry for not following up (but belive me, I read you). I would need a week to make an experiment again so I can tell you accurately my user experience and my expectations. Would you mind waiting or you need answer asap?

ghsanti commented 2 months ago

Thanks! No rush @jsaladich; do it whenever you can, and if you cant it's fine as well.

jsaladich commented 2 months ago

@ghsanti i would never miss such opportunity!!

jsaladich commented 1 month ago

Hi @ghsanti sorry for the delay, just run some dummy KT optimizations: As of version 1.4.7 we have the score as a single value, which is repeated from the metrics value (depending on user selection, val_loss or loss.

Assuming the user asked for executions_per_trial = 10, we are missing in metrics 10 val_loss and loss values that could be very useful. If you add the whole training epochs (I believe is what you suggested in your json with the structure in obsevations it is also a nice to have feature.

In oracle.json, I have seen a confusing key run_times. To me it seems to be related with executions_per_trial, but then I saw there is another parameter max_retries_per_trial. If I understand properly, run_times is the real max_retries_per_trial for each experiment, not the number of executions_per_trial. It would be nice to have a record of the executions_per_trial in oracle too.

Finally, I understand that executions_per_trial is the number of re-fit / re-train in a given trial. But then, there is the number of re-predict. Which is; for each trial and for each execution_per_trial how many times should the algorithm predict the output. Thus we can have a full benchmark of the uncertainty of the network being optimized.

Let me know if my explanation is understandable!

Thanks a lot for your time and patience!

ghsanti commented 1 month ago

Hi, changes are in my fork only; it wont be wanted here bc i removed all backwards compatibility.

They may still want to support it here (but I don't see anyone replying.)

The fork aims for keras>=3.5 and tf>=2.17. (Note that it's not finished, but it may work for simple projects.)

Here I included sample-outputs. (I think some of those you mentioned are fixed.)

jsaladich commented 1 month ago

Yes, I belive that will help a lot any user of KT. Perhaps another topic (that might require more dev. work) is the number of times that model.predict() should be ran in the same execution for a given trial. But, long story short, this is a nice implementation with the current state of KT! Thanks a lot @ghsanti !!! kudos for you!

P.S: Quick question, shouldn't the score of your sample json (i.e. https://github.com/ghsanti/keras-tuner/blob/main/example-results/test/trial_05/trial.json) match with the trial's average of the 3 executions instead of the trial's best execution (as it is now)?

ghsanti commented 1 month ago

You are welcome 🤗

P.S: Quick question, shouldn't the score of your sample json (i.e. https://github.com/ghsanti/keras-tuner/blob/main/example-results/test/trial_05/trial.json) match with the trial's average of the 3 executions instead of the trial's best execution (as it is now)?

That's a valid point, currently it's logged that way for simplicity i.e just keeps one best-overall value (until i get the rest working reliably); I'll take a closer look at it during the next week, once i fix some failing tests.

Feel free to open a discussion or issue in my fork as well, for any other changes.