haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
17.84k stars 1.93k forks source link

[Question] Question on `image_newline` for single image #1565

Open SCZwangxiao opened 1 week ago

SCZwangxiao commented 1 week ago

I think the image_newline here is the implementation of Row-ended tokens in the paper. https://github.com/haotian-liu/LLaVA/blob/c121f0432da27facab705978f83c4ada465e46fd/llava/model/llava_arch.py#L82-L86

However, for single image input, the tokens are not appended to each row as expected in the paper. Specifically, only one token is appended to the flatten patch tokens of the image. https://github.com/haotian-liu/LLaVA/blob/c121f0432da27facab705978f83c4ada465e46fd/llava/model/llava_arch.py#L191-L196