huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.7k stars 26.44k forks source link

`self.projection` is unnecessarily called twice in `VivitTubeletEmbeddings` #31619

Closed v-iashin closed 3 months ago

v-iashin commented 3 months ago

System Info

the problem is on main and has been there since ViViT was added.

Who can help?

@amyeroberts

Information

Tasks

Reproduction

https://github.com/huggingface/transformers/blob/0f67ba1d741d65b07d549daf4ee157609ce4f9c1/src/transformers/models/vivit/modeling_vivit.py#L75-L80

Expected behavior

# permute to (batch_size, num_channels, num_frames, height, width)
pixel_values = pixel_values.permute(0, 2, 1, 3, 4)

# out_batch_size, out_num_channels, out_num_frames, out_height, out_width = x.shape
x = self.projection(pixel_values).flatten(2).transpose(1, 2)

should be enough.

i can make a minor pr to fix it.

amyeroberts commented 3 months ago

Eek - indeed - thanks for raising! If you open a PR to fix, you can ping me for a quick review and you'll get the github contribution for the fix :)