Open ashwinpra opened 4 days ago
What issues were faced when
truncation
was set toTrue
?
We rely on return_assistant_tokens_mask
of apply_chat_template
to automatically slice assistant tokens, which is useful for constructing labels. As indicated in the comment, when truncation=True
in apply_chat_template
, the returned assistant tokens mask is somehow wrong when I tried earlier. You can try it yourself to confirm.
an image present on the right side, it could get truncated unevenly?
This is indeed possible, but 1) it wouldn't cause a technical error that would fail the training, and 2) this behavior (some image tokens getting truncated) simply cannot be avoided if it really happens.
Let me know if this makes sense.
Hi, that clears some of my doubts, but I have a related query.
Let's say my prompt is something like this: <image><image>\nWhat is the difference between the two images?
, and the two images are a.jpg
and b.jpg
.
Now say I'm doing left truncation, and the image tokens corresponding to the first <image>
tag get truncated. Could the second <image>
tag accidentally get replaced by the tensor of a.jpg
, instead of b.jpg
?
https://github.com/zjysteven/lmms-finetune/blob/86895101a7f794c47cb3acc1061d0a148bc0b1df/collators/llava_onevision.py#L225-L230
I can see that the truncated prompt (input_ids
) and the image tensors (vision_inputs
) are returned separately. How are they processed while training?
This is a very specific (corner) case which I don't know for sure either. You can look at the source code of transformers to see how vision embeddings are processed during training https://github.com/huggingface/transformers/blob/f73f5e62e2383c1cb6975fca70082d6dc51ec6f2/src/transformers/models/llava_onevision/modeling_llava_onevision.py#L667-L693
Hi! I had a few doubts regarding the truncation being done in the data collator.
collators/llava_onevision.py
:https://github.com/zjysteven/lmms-finetune/blob/b3a68751d4631e7de5441f1c81cde982119991a4/collators/llava_onevision.py#L133-L141
What issues were faced when
truncation
was set toTrue
?When you directly truncate the tensor, isn't it possible that if there's an image present on the right side, it could get truncated unevenly? For instance, let's say there's an image that extends from index 1000 to 2000, and you truncate it till index 1500. Wouldn't such an instance result in an error?
Thanks in advance!