mrdbourke / tensorflow-deep-learning

All course materials for the Zero to Mastery Deep Learning with TensorFlow course.
https://dbourke.link/ZTMTFcourse
MIT License
5.05k stars 2.5k forks source link

Notebook 05: load_weights results in "Incompatible tensor with shape (1280, 10)..." #544

Open ivanthecrazy opened 1 year ago

ivanthecrazy commented 1 year ago

When creatingmodel_2 and trying to load the weights by

model_2.load_weights(checkpoint_path)

I'm getting the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-55-d2e3006b884f>](https://localhost:8080/#) in <cell line: 2>()
      1 # Load model from checkpoint, that way we can fine-tune from the same stage the 10 percent data model was fine-tuned from
----> 2 model_2.load_weights(checkpoint_path) # revert model back to saved weights

1 frames
[/usr/local/lib/python3.9/dist-packages/tensorflow/python/ops/resource_variable_ops.py](https://localhost:8080/#) in _restore_from_tensors(self, restored_tensors)
    718             self.handle, self.shape, restored_tensor)
    719       except ValueError as e:
--> 720         raise ValueError(
    721             f"Received incompatible tensor with shape {restored_tensor.shape} "
    722             f"when attempting to restore variable with shape {self.shape} "

ValueError: Received incompatible tensor with shape (1280, 10) when attempting to restore variable with shape (1, 1, 1152, 48) and name Adam/m/block7a_se_reduce/kernel:0.

I tried to download the notebook from this repo, but have the same result.

mrdbourke commented 1 year ago

Hi @ivanthecrazy,

Investigating this issue myself.

I'm going through the following resources:

Looks like it's an issue with newer versions of TensorFlow and tf.keras.applications.efficientnet models and using the load_weights() method.

My current solution is installing TensorFlow 2.9.0 (as suggested by the links above) and running it from there.

For example:

# Install TensorFlow 2.9.0 to avoid issues (later versions may work)
# -U stands for "update" and "-q" stands for "quiet"
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")

I will make sure this works and investigate it further if something is wrong.

I'll post another comment here once I've fixed the notebook: https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/05_transfer_learning_in_tensorflow_part_2_fine_tuning.ipynb

mrdbourke commented 1 year ago

Update: I've confirmed that running notebook 05 works end-to-end with TensorFlow 2.9.0 (as per the links above).

Install TensorFlow 2.9.0 with:

# Install TensorFlow 2.9.0 to avoid issues (later versions may work)
# -U stands for "update" and "-q" stands for "quiet"
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")

I'm not quite sure what's happening with later versions (e.g. 2.10.0+), the issues above seem to be long standing.

The notebook code has been updated to reflect installing TensorFlow 2.9.0 at the start.

See the updated code here and let me know how it goes: https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/05_transfer_learning_in_tensorflow_part_2_fine_tuning.ipynb

filipposkar commented 1 year ago

I had the same problem. I can verify that the new Daniel's notebook (tf version 2.9.0) works fine. Furthermore all these “Model failed to serialize as JSON” warnings, while fitting the various models have been disappeared.

mrdbourke commented 1 year ago

Hi @filipposkar , glad to hear you got it fixed!

Looks like this should also be fixed further in upcoming versions of TensorFlow (e.g. 2.13+).

For now, it looks like TensorFlow 2.9.0 works.

See this comment here: https://github.com/keras-team/tf-keras/issues/383

mrdbourke commented 1 year ago

Update: looks like TensorFlow 2.9.0 is still the most stable here, see: https://github.com/mrdbourke/tensorflow-deep-learning/issues/553

TL;DR tried tf-nightly(2.14.0-dev20230520) and it still broke.

ivanthecrazy commented 1 year ago

Thank you @mrdbourke

OFALOFAL commented 1 year ago

i have issiue with changing verion of tensorflow, '!pip install -U -q tensorflow==2.9.0' doesn't work

VuduVations commented 1 year ago

@OFALOFAL

Here is my temporary work around.

The contributing factor seems to be stemming from line of code at [29] from @mrdbourke 05_transfer_learning where the install of TF is upgrading to latest version of TF, however, some of the dependencies are deprecated in the latest version of TF since we are working with tensorflow 2.9.0.

  1. Removed import tensorflow as tf from block [29] of 05_transfer_learning github

Screenshot 2023-07-01 at 6 18 29 PM

  1. Scroll to top of your code to block [1] and use

Screenshot 2023-07-01 at 6 24 12 PM

  1. Insert the tensorflow==2.9.0 install and import in block [2]

Screenshot 2023-07-01 at 6 24 18 PM

**(notes on protobuf below as the dependency is incompatible; however, the results have compiled the same as predicted.)

  1. I cleared all outputs and compiled the code from the beginning.

Screenshot 2023-07-01 at 6 14 46 PM

The Protobuf dependency used in TensorFlow is used to serialize and deserialize data. This means that it can be used to convert data from one format to another, such as from a Python object to a binary file. This is useful for TensorFlow because it allows models to be saved and loaded easily, and it also allows for communication between different TensorFlow components.

Specifically, Protobuf is used in TensorFlow for the following purposes:

Overall, the Protobuf dependency used in TensorFlow is a valuable tool that allows TensorFlow models to be saved, loaded, and communicated with other components. It is a versatile data format that is widely used in the industry, and it makes TensorFlow more accessible to other libraries and frameworks. - Source: Bard

arpadikuma commented 1 year ago

Your version of protobuf will most likely result in errors with tensorflow-datasets It requires a much more recent version. The issue is that it requires a module called builder.py that's not present in version 3.19.x The best workaround for that so far is to force reinstall protobuf=3.20.3 using pip install --force-reinstall "protobuf=3.20.3". Pip will complain about incompatibilities left and right but I've found it to work without issues so far with tf 2.9 to 2.12 with tensorflow-datasets and other libraries.

@OFALOFAL

Here is my temporary work around.

The contributing factor seems to be stemming from line of code at [29] from @mrdbourke 05_transfer_learning where the install of TF is upgrading to latest version of TF, however, some of the dependencies are deprecated in the latest version of TF since we are working with tensorflow 2.9.0.

  1. Removed import tensorflow as tf from block [29] of 05_transfer_learning github

Screenshot 2023-07-01 at 6 18 29 PM

  1. Scroll to top of your code to block [1] and use
  • !pip uninstall -y tensorflow to remove the 2.12.x version

Screenshot 2023-07-01 at 6 24 12 PM

  1. Insert the tensorflow==2.9.0 install and import in block [2]
  • !pip install -U -q tensorflow==2.9.0
  • import tensorflow as tf
  • print(tf.version)
  • from tensorflow import keras

Screenshot 2023-07-01 at 6 24 18 PM

**(notes on protobuf below as the dependency is incompatible; however, the results have compiled the same as predicted.)

  1. I cleared all outputs and compiled the code from the beginning.

Screenshot 2023-07-01 at 6 14 46 PM

The Protobuf dependency used in TensorFlow is used to serialize and deserialize data. This means that it can be used to convert data from one format to another, such as from a Python object to a binary file. This is useful for TensorFlow because it allows models to be saved and loaded easily, and it also allows for communication between different TensorFlow components.

Specifically, Protobuf is used in TensorFlow for the following purposes:

  • Serializing and deserializing TensorFlow models: When a TensorFlow model is saved, it is serialized into a Protobuf file. This file can then be loaded back into TensorFlow to restore the model.
  • Communicating between different TensorFlow components: TensorFlow components, such as the TensorFlow Serving server and the TensorFlow Lite library, use Protobuf to communicate with each other. This allows them to exchange data in a format that is both efficient and easy to understand.
  • Providing a common data format for TensorFlow and other libraries: Protobuf is a widely used data format, so it can also be used to communicate with other libraries that use Protobuf. This makes it easier to integrate TensorFlow with other libraries, such as the gRPC RPC framework.

Overall, the Protobuf dependency used in TensorFlow is a valuable tool that allows TensorFlow models to be saved, loaded, and communicated with other components. It is a versatile data format that is widely used in the industry, and it makes TensorFlow more accessible to other libraries and frameworks. - Source: Bard

Ammar-Azman commented 11 months ago

Hi @mrdbourke .

I run the line suggested,

!pip uninstall -y tensorflow
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")

But it showed this.

Found existing installation: tensorflow 2.9.0
Uninstalling tensorflow-2.9.0:
  Successfully uninstalled tensorflow-2.9.0
WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages)
WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages)
TensorFlow version: 2.12.0
arpadikuma commented 11 months ago

Hi @mrdbourke .

I run the line suggested,

!pip uninstall -y tensorflow
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")

But it showed this.

Found existing installation: tensorflow 2.9.0
Uninstalling tensorflow-2.9.0:
  Successfully uninstalled tensorflow-2.9.0
WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages)
WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages)
TensorFlow version: 2.12.0

Did you restart the runtime? Iirc tensorflow tells you it will only take effect after restarting it

mrdbourke commented 11 months ago

Hi all,

After much troubleshooting, I've found the best fix for tf.keras.applications.EfficientNetB0 problems is to simply upgrade to tf.keras.applications.efficientnet_v2.EfficientNetV2B0.

You can see a full write-up of the fix here: https://github.com/mrdbourke/tensorflow-deep-learning/discussions/575

talha-0 commented 11 months ago

It worked for me if i recompile the model before loading weights it may be because the model was training and it changed some layers and the tensor shape was no longer compatible

mrdbourke commented 11 months ago

@talha-0 Great catch! Thank you for the update!

ezawadzki commented 10 months ago

I got the same issue with trying to customize my model for Image Classification. I noticed that it worked the first time but after I got this error. After deleting the export model folder each time I do the training, it works, even with Tensorflow=2.11.0

SGhuman123 commented 10 months ago

Hi all,

After much troubleshooting, I've found the best fix for tf.keras.applications.EfficientNetB0 problems is to simply upgrade to tf.keras.applications.efficientnet_v2.EfficientNetV2B0.

You can see a full write-up of the fix here: keras-team/keras#575

I tried the solution here but it doesn't seem to work for me

mrdbourke commented 9 months ago

Hi all, After much troubleshooting, I've found the best fix for tf.keras.applications.EfficientNetB0 problems is to simply upgrade to tf.keras.applications.efficientnet_v2.EfficientNetV2B0. You can see a full write-up of the fix here: keras-team/keras#575

I tried the solution here but it doesn't seem to work for me

Oh dam!

What error are you getting now?

Did you try to reference the updated Notebook 05? See: https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/05_transfer_learning_in_tensorflow_part_2_fine_tuning.ipynb

nika-va commented 9 months ago

I recompiled the model:

model_2.compile(loss='categorical_crossentropy',
                optimizer=tf.keras.optimizers.Adam(),
                metrics='accuracy')

and got rid of .ckpt from the checkpoint_path: checkpoint_path = 'ten_percent_model_checkpoints_weights/checkpoint' it just works perfectly fine now.

AgusZanini commented 9 months ago

Using tf.keras.applications.efficientnet_v2.EfficientNetV2B0 didn't work for me, neither using other versions of tensorflow. It only works if I compile the model again before loading weights. If I leave the .ckpt extension or not in the checkpoint path does not affect the result, I think.

MiaZhengLS commented 8 months ago

I got the similar error when I tried to load the best model from the keras tuner. I'm using a custom transformer model and the tuning works fine.

image
MiaZhengLS commented 8 months ago

I also tested if I don't create a new tuner instance with the same parameter (except 'overwrite=False') but use the tuner instance created for fine-tuning, I don't get the error anymore but this time I'm required to provide input_shape for model.build

image
shounak03 commented 3 months ago

getting the same error in 2024 as well "ValueError: Received incompatible tensor with shape (1280, 10) when attempting to restore variable with shape (1, 1, 1152, 48) and name Adam/m/block6h_se_reduce/kernel:0.", i tried downloading the 2.9 version but it doesnt work, any help @mrdbourke?

evgen1100 commented 2 months ago

Actually this issue caused because model is recompiling between weights are saved and loaded. in other words we are trying to load weights in slightly different model (with unlocked layers of base model). quite obvious solution is - recreate model from scratch and load weights once again (and unlock layers once again if needed)