zjysteven / lmms-finetune

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, qwen-vl, qwen2-vl, phi3-v etc.
Apache License 2.0
163 stars 21 forks source link

error occurred: The input provided to the model are wrong. The number of image tokens is 4 while the number of image given to the model is 2. This prevents correct indexing and breaks batch generation. #21

Closed ReadyTeresa closed 2 months ago

ReadyTeresa commented 2 months ago

My text prompt is: According to the text and image identify the same attribute of the two product. [{'text':xxx,'image':<>},{'text':xxx,'image':<>] Then, the error occurred: The input provided to the model are wrong. The number of image tokens is 4 while the number of image given to the model is 2. This prevents correct indexing and breaks batch generation.

zjysteven commented 2 months ago

Were you able to identify the specific dataset entry in your data json file that caused the error? If so could you paste it here? The current description does not provide enough information for me to exactly know what's going on.

Also is this about training or inference? @ReadyTeresa