`self.projection` is unnecessarily called twice in `VivitTubeletEmbeddings`

System Info

the problem is on main and has been there since ViViT was added.

Who can help?

@amyeroberts

Information

[X] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

https://github.com/huggingface/transformers/blob/0f67ba1d741d65b07d549daf4ee157609ce4f9c1/src/transformers/models/vivit/modeling_vivit.py#L75-L80

Expected behavior

# permute to (batch_size, num_channels, num_frames, height, width)
pixel_values = pixel_values.permute(0, 2, 1, 3, 4)

# out_batch_size, out_num_channels, out_num_frames, out_height, out_width = x.shape
x = self.projection(pixel_values).flatten(2).transpose(1, 2)

should be enough.

i can make a minor pr to fix it.

huggingface / transformers

`self.projection` is unnecessarily called twice in `VivitTubeletEmbeddings` #31619

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior