Open SCZwangxiao opened 1 week ago
I think the image_newline here is the implementation of Row-ended tokens in the paper. https://github.com/haotian-liu/LLaVA/blob/c121f0432da27facab705978f83c4ada465e46fd/llava/model/llava_arch.py#L82-L86
image_newline
Row-ended tokens
However, for single image input, the tokens are not appended to each row as expected in the paper. Specifically, only one token is appended to the flatten patch tokens of the image. https://github.com/haotian-liu/LLaVA/blob/c121f0432da27facab705978f83c4ada465e46fd/llava/model/llava_arch.py#L191-L196
I think the
image_newline
here is the implementation ofRow-ended tokens
in the paper. https://github.com/haotian-liu/LLaVA/blob/c121f0432da27facab705978f83c4ada465e46fd/llava/model/llava_arch.py#L82-L86However, for single image input, the tokens are not appended to each row as expected in the paper. Specifically, only one token is appended to the flatten patch tokens of the image. https://github.com/haotian-liu/LLaVA/blob/c121f0432da27facab705978f83c4ada465e46fd/llava/model/llava_arch.py#L191-L196