lululxvi / deepxde

A library for scientific machine learning and physics-informed learning
https://deepxde.readthedocs.io
GNU Lesser General Public License v2.1
2.5k stars 715 forks source link

The passed save_path is not a valid checkpoint ; while using `model.restore` #925

Open anshumansinha16 opened 1 year ago

anshumansinha16 commented 1 year ago

When I am using tensorflow site-packages/tensorflow/python/training/saver.py which are checker and restore. these have been given path to folder 'results' ; the following folder is saving these files as shown in the code below.

I think the code line model.restore("/Users/anshumansinha/Desktop/Project/model/"+save_str+"model.ckpt-" + str(np.argmin(model.losshistory.loss_test)*100), verbose=0) is not able to get the correct path.

My save_str is as follows: save_str = func_str+'_Seed_'+str(seed)+'_Samples_'+str(samples)+'_X_'+str(exponent_truth)+'_'+str(exponent_approx)+'_epochs_'+str(epochs)+'_blayers_'+str(b_layers)+'_neurons_'+str(neurons)

The files which are saved in the model/ folder are named like these: II have added a picture of the folder as well! Levin1_Seed_1_Samples_10_X_13_4_epochs_100_blayers_3_neurons_125model.ckpt-100.ckpt.data-00000-of-00001

According to me codes before model.restore are saving the files in folder model ; but the code model.restore is not able to access it somehow! May someone help me with this! Thanks.

Code

model = dde.Model(data, net)
model.compile("adam", lr=lr, metrics=[mean_squared_error])
checker = dde.callbacks.ModelCheckpoint(
    "/Users/anshumansinha/Desktop/Project/model/"+save_str+"model.ckpt", save_better_only=False, period=100

)
losshistory, train_state = model.train(epochs=epochs, callbacks=[checker]) #Training Model batch_size = 10000
# For plotting the residuals and the training history: isplot=True will plot
dde.saveplot(losshistory, train_state, issave=False, isplot=False)

# Restore the best test loss model
model.restore("/Users/anshumansinha/Desktop/Project/model/"+save_str+"model.ckpt-" + str(np.argmin(model.losshistory.loss_test)*100), verbose=0)

The error is as follows:

Traceback (most recent call last):
  File "/Users/anshumansinha/Desktop/Project/file/./main.py", line 302, in <module>
    NN_MSEs_test, NN_MSEs_train = DeepONet(samples, split, y/np.max(np.abs(y)) , I, inds, neurons, epochs, b_layers)
  File "/Users/anshumansinha/Desktop/Project/file/./main.py", line 282, in DeepONet
    model.restore("/Users/anshumansinha/Desktop/Project/model/"+save_str+"model.ckpt-" + str(np.argmin(model.losshistory.loss_test)*100), verbose=0)
  File "/Users/anshumansinha/venv/lib/python3.10/site-packages/deepxde/model.py", line 914, in restore
    self.saver.restore(self.sess, save_path)
  File "/Users/anshumansinha/venv/lib/python3.10/site-packages/tensorflow/python/training/saver.py", line 1409, in restore
    raise ValueError("The passed save_path is not a valid checkpoint: " +
ValueError: The passed save_path is not a valid checkpoint: /Users/anshumansinha/Desktop/Project/model/Levin1_Seed_1_Samples_100_X_13_10_epochs_100_blayers_7_neurons_500model.ckpt-100

The folder "results" is saving all these files, including 'checkpoint' file!

enter image description here Image of the folder 'results'

praksharma commented 1 year ago

May be use relative paths (spaces creates problem sometimes). I am not sure if it will work without the .ckpt extension. Try renaming .ckpt-100 to .ckpt.

123new-net commented 1 year ago

@praksharma I also had a problem with save and reload. I looked at some of the answers in the FAQ on this and had trouble using both methods.

After using the code to save, I look at the folder where the file will be added. For example, in the first case, after saving the file, add the file shown in the following figure. V4{TJQ3MPTHPHRHZS2WAIZX

I hope you can help me. Thank you very much.

lululxvi commented 1 year ago

This could be an issue of Tensorflow in Windows. You may try Linux system.

alabaykazakh commented 1 year ago

@praksharma I also had a problem with save and reload. I looked at some of the answers in the FAQ on this and had trouble using both methods.

  • The first case is to save the model only in the last step, using the following code. model = dde.Model(data, net) model.compile("adam", lr=0.001) losshistory, train_state=model.train(epochs=10000, model_save_path=r"C:\Users\Administrator\model2\model.ckpt") dde.saveplot(losshistory, train_state, issave=True, isplot=True) Then use the following code to load in the new file. model.restore(r"C:\Users\Administrator\model2\model.ckpt-10000",verbose=1) The following error message is displayed. ValueError: The passed save_path is not a valid checkpoint: C:\Users\Administrator\model2\model.ckpt-10000
  • The second case is to use ModelCheckpoint and save every certain steps. model = dde.Model(data, net) model.compile("adam", lr=0.001) checkpointer = dde.callbacks.ModelCheckpoint("./model/model.ckpt", save_better_only=True, period=1000) model.train(epochs=10000, callbacks=[checkpointer]) dde.saveplot(losshistory, train_state, issave=True, isplot=True) Then make the call with Restore, and the same error occurs. model.restore("./model/model.ckpt-7000") ValueError: The passed save_path is not a valid checkpoint: ./model/model.ckpt-7000

After using the code to save, I look at the folder where the file will be added. For example, in the first case, after saving the file, add the file shown in the following figure. V4{TJQ3MPTHPHRHZS2WAIZX

I hope you can help me. Thank you very much.

I had the same issue but I managed to solve it. Instead of model.restore("./model/model.ckpt-7000") try this model.restore("./model/model.ckpt-7000.ckpt")