Open lbortolotti opened 1 month ago
Hi @lbortolotti -
Thanks for reporting the issue. Here as per your code while saving the model weights you used use_custom_lstm=True
in this line model = make_model(use_custom_lstm=True)
. But while loading the model used use_custom_lstm=False
, because of this facing an error.
By taking model.summary()
while saving weights will gives better idea which layers and paramters are used for model creation.
Using below changes will resolve the error.
new_model = make_model(use_custom_lstm=True)
new_model.load_weights(weights_file)
Attached gist for your reference as well.
Hi. I'm perfectly aware that the model I'm loading the weights into has a "different" lstm layer. However, the models have 1) identical weight structure/dimensions and 2) identically named layers. In this situation, keras has always supported loading weights, even if the layer class has changed. This is something I've used all the time, and continues to work with tf-keras.
My example is particularly extreme as MyCustomLSTM is absolutely identical to a vanilla LSTM layer - normally I'd have some custom logic in there (but nothing that affects the weight structure).
The official doc confirms that my example should work, I think, as it says Weights are loaded based on the network's topology. :
https://keras.io/api/models/model_saving_apis/weights_saving_and_loading/#loadweights-method
Hi @lbortolotti -
As per the load_weights method, Weights are loaded based on the network's topology but architecture should be the same as when the weights were saved(as per the load_weight document).
So while saving weights, model architecture has InputLayer+ MyCustomLSTM layer(based on model summary) created by use_custom_lstm=True
argument. So while loading the weights, model creation should be same use_custom_lstm=True
argument to get same architecture.
Attaching gist where running with keras2(tf-keras) and it is giving same error with new_model = make_model(use_custom_lstm=False)
Interesting. I've dug a bit deeper, and have found that to restore model loading functionality I have to change the file name from
To replicate, you just need to take your last gist, save the weights as weights_file = "this_is_a_test.h5"
, and you'll find that they then load correctly.
Which behaviour is the expected behaviour? I was definitely using the functionality...
I'm using the official TF 2.17 container (tensorflow/tensorflow:2.17.0-gpu-jupyter) + keras==3.5.0.
The following code saves a model which contains a (dummy) custom LSTM layer, then inits a new copy of the model (with a vanilla LSTM) and tries to load the weights from the first model into the second.
Code:
Output:
Considering that the custom layer in this case is doing absolutely nothing of interest, I assume this is a bug. If not, please let me know how one is meant to wrap a LSTM layer to avoid this issue.
Thanks!