huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.6k stars 26.91k forks source link

PaliGemma (and probably SigLIP) inference broken in latest transformers version #33929

Closed mranzinger closed 1 month ago

mranzinger commented 1 month ago

This seems like a breaking change. When I run this model with transformers==4.44.2, things work fine. However, when running with transformers==4.25.1, it fails with this error:


...

    num_positions = self.position_embeddings.shape[1]

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1687, in __getattr__

    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

AttributeError: 'SiglipVisionEmbeddings' object has no attribute 'position_embeddings'. Did you mean: 'position_embedding'?

Originally posted by @mranzinger in c6d2848

ArthurZucker commented 1 month ago

Yep opening a PR asap cc @xenova !

mranzinger commented 1 month ago

I think this is still broken. self.position_embedding is of type nn.Embedding, which means it doesn't have a .shape variable.

I think you want to bring back

position_embeddings = self.position_embedding.weight.unsqueeze(0)
ArthurZucker commented 1 month ago

Fixed properly this time sorry!

ArthurZucker commented 1 month ago

32600 also introduced a regression by storing position embeds twice

mranzinger commented 1 month ago

33965 does look better!