Closed zjysteven closed 3 months ago
@zjysteven hey! Yes, there's an edge case in LLavaNext models when we can't know how the data should be padded from the inputs, this happens when batch size is more than 1 and all inputs withing batch are same length. Usually padding side can be set by model.padding_side = "right"
Not sure how this affected training when bs=1, because in that case there's no padding added.
But I think we can also automatically change padding side in model if in train mode for cases when bs > 1. I'll check out it works as expected and open a PR
System Info
Unrelated to this issue
Who can help?
@zucchini-nlp @Narsil
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
At the moment I don't have a concise script for reproducing, but I think the problem should be clear given the below description.
Basically I'm trying to finetune the new LlavaNextVideo model. The current implementation (see code below) will always assume
left padding
when the batch size is 1. This will cause a "index out of bound" error when training with batch size being 1, due to the unexpected left padding (instead of right padding) for training.https://github.com/huggingface/transformers/blob/0fdea8607d7e01eb0e38a1ebeb7feee30a22f0cf/src/transformers/models/llava_next_video/modeling_llava_next_video.py#L565-L576
I can try to put up an example with concrete shape/dims to better show the behavior, but just on a high level I believe the desired behavior is that as long as we are in the training mode (determined by the
_left_padding
and_right_padding
), we should setleft_padding
to False, regardless of whether batch size is 1.Expected behavior
https://github.com/huggingface/transformers/blob/0fdea8607d7e01eb0e38a1ebeb7feee30a22f0cf/src/transformers/models/llava_next_video/modeling_llava_next_video.py#L565-L576
As long as we are in the training mode (determined by the
_left_padding
and_right_padding
), we should setleft_padding
to False, regardless of whether batch size is 1.