[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)
Reproduction
There is a typo in the following lines in LlavaNextProcessor as current_width and current_height are inverted which can cause errors due to miss match of image feature size computed by the processor and by the vision branch in LlavaNextForConditionalGeneration. I encountered this issue while running the following example script.
System Info
transformers
version: 4.45.0.dev0Who can help?
@zu
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
There is a typo in the following lines in
LlavaNextProcessor
ascurrent_width
andcurrent_height
are inverted which can cause errors due to miss match of image feature size computed by the processor and by the vision branch inLlavaNextForConditionalGeneration
. I encountered this issue while running the following example script.Here is a code snippet to reproduce the issue:
Expected behavior
No assertion error.