keras-team / keras-tuner

A Hyperparameter Tuning Library for Keras
https://keras.io/keras_tuner/
Apache License 2.0
2.86k stars 397 forks source link

Error while trying to load model from tuner.get_best_models() #697

Open iaioanno opened 2 years ago

iaioanno commented 2 years ago

Hello, i have an issue when trying to create my custom tuner through the tuner class in order to upload the model hyperparameters at Weights and Biases site.

Even though everything seems to be working fine with the upload of each model, i get an error when I’m trying to get_best_models()

I would appreciate your help a lot!

Here is the tuner class I’m trying to create

class MyTuner(kt.Tuner):

  def run_trial(self, trial, trainX, trainY, batch_size, objective, epochs, validation_data):

      hp = trial.hyperparameters
      objective_name_str = objective

      ## create the model with the current trial hyperparameters
      model = self.hypermodel.build(hp)

      ## Initiates new run for each trial on the dashboard of Weights & Biases
      run = wandb.init(project="Help_please", config=hp.values)

      checkpoint_filepath=f'./Aggr_{int(agro_id)}_{model_str}/checkpoint'

      model_checkpoint_callback = ModelCheckpoint(filepath=checkpoint_filepath,
                              monitor='val_mae',    
                              save_best_only=True)

      my_callbacks = [
                      EarlyStopping(patience=20),
                      model_checkpoint_callback,
                      WandbCallback(), 
                     ]

      history = model.fit(trainX,
                trainY,
                batch_size=batch_size,
                epochs=epochs,
                validation_data=validation_data,
                callbacks=my_callbacks)  

      val_mae = history.history['val_mae'][-1]  

      self.oracle.update_trial(trial.trial_id, {objective_name_str:val_mae})

      # I tried using save_model after the trial but but i get an error and the tuning stops after the first try
      # self.save_model(trial.trial_id, model)

      ## ends the run on the Weights & Biases dashboard
      run.finish()    

Now when i try to use the tuner for tuner.search() everything works fine, but when i get to get_best_models() i get an error

D = X_train.shape[1]    # number of features
no_model = 0    # initialize the number of models to save
parent_dir = f"{int(agro_id)}_Aggr_4_models_{model_str}"

os.makedirs(parent_dir)  # create the path to save the model

for number_of_layers in range(3, 8):
  # create the model from the class MyHyperModel
  my_model =  MyHyperModel(number_of_layers)

  tuner = MyTuner(
      hypermodel=my_model,
      oracle=kt.oracles.BayesianOptimization(
        objective=objective,
        max_trials=4),
      executions_per_trial=1,
      overwrite=True,
      directory=f"DAM_Aggr_tuning_{int(agro_id)}_{model_str}",
      project_name=f"DAM_Aggr_tuning_{int(agro_id)}_{model_str}_project",
  )

  tuner.search(X_train, y_train, 
                batch_size=BATCH_SIZE,
                epochs=EPOCHS, 
                validation_data=(X_val, y_val),
                objective=objective,
                )

  best_models = tuner.get_best_models(num_models=7)

  for best_model in best_models:
      best_model.build(input_shape=(None, D))   
      saved_model_path = os.path.join(parent_dir, f"{no_model}")
      best_model.save(filepath=saved_model_path)    
      no_model += 1 

The error message is: NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for DAM_Aggr_tuning_129_cc\DAM_Aggr_tuning_129_cc_project\trial_3\checkpoint

here is also my github folder in case you want to see the full code https://github.com/iaioanno/WandB/tree/main/HUB

If anyone could help me it would be amazing, thank you.

morganmcg1 commented 2 years ago

Flagged with Ayush + Soumik

Andyjdv commented 2 years ago

I have the exact same issue. I'm using Google Drive as the directory and I can see it creates files, but not a "checkpoint" folder. Did you resolve this?

Andyjdv commented 2 years ago

Would a workaround be to just use get_best_hyperparameters(num_trials=1)[0] and build a model with those values?

JamminBreeze commented 2 years ago

The issue, as best I can tell comes from the current version of TensorFlow and Keras_Tuner. Somewhere between TF 2.3 and 2.9 something changed (I've been looking but I'm not good enough to find it) that causes an incompatibility when calling get_best_models().

haifeng-jin commented 1 year ago

Would a workaround be to just use get_best_hyperparameters(num_trials=1)[0] and build a model with those values?

That is what was done under the hood of get_best_models().

javaLobster commented 1 year ago

I was able to workaround this issue under Keras Tuner v.1.3.5 and Tensorflow v. 2.12.0, however unfortunately not solving the missing checkpoints. I subclassed Tuner.run_trial() by overriding keras_tuner.Tuner as in the documentation and added there the implementation of tuner_utils.SaveBestEpoch() from keras_tuner.Tuner.

The relevant part to make the call of MyTuner.get_best_models() working was as follows in the Tuner subclass "MyTuner":

checkpoint=tuner_utils.SaveBestEpoch(objective=self.oracle.objective, filepath=self._get_checkpoint_fname(trial.trial_id)) checkpoint.set_model(yourBuiltmodel) checkpoint._save_model()

I hope this might help you.