huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.71k stars 26.94k forks source link

LlavaNextVideo always assumes left padding when batch size is 1 #32112

Closed zjysteven closed 3 months ago

zjysteven commented 3 months ago

System Info

Unrelated to this issue

Who can help?

@zucchini-nlp @Narsil

Information

Tasks

Reproduction

At the moment I don't have a concise script for reproducing, but I think the problem should be clear given the below description.

Basically I'm trying to finetune the new LlavaNextVideo model. The current implementation (see code below) will always assume left padding when the batch size is 1. This will cause a "index out of bound" error when training with batch size being 1, due to the unexpected left padding (instead of right padding) for training.

https://github.com/huggingface/transformers/blob/0fdea8607d7e01eb0e38a1ebeb7feee30a22f0cf/src/transformers/models/llava_next_video/modeling_llava_next_video.py#L565-L576

I can try to put up an example with concrete shape/dims to better show the behavior, but just on a high level I believe the desired behavior is that as long as we are in the training mode (determined by the _left_padding and _right_padding), we should set left_padding to False, regardless of whether batch size is 1.

Expected behavior

https://github.com/huggingface/transformers/blob/0fdea8607d7e01eb0e38a1ebeb7feee30a22f0cf/src/transformers/models/llava_next_video/modeling_llava_next_video.py#L565-L576

As long as we are in the training mode (determined by the _left_padding and _right_padding), we should set left_padding to False, regardless of whether batch size is 1.

zucchini-nlp commented 3 months ago

@zjysteven hey! Yes, there's an edge case in LLavaNext models when we can't know how the data should be padded from the inputs, this happens when batch size is more than 1 and all inputs withing batch are same length. Usually padding side can be set by model.padding_side = "right"

Not sure how this affected training when bs=1, because in that case there's no padding added.

But I think we can also automatically change padding side in model if in train mode for cases when bs > 1. I'll check out it works as expected and open a PR