Closed icrto closed 7 months ago
Hi, thanks for your interest. I think that's due to different versions of Llava - the older version I used when developing this repo seems to need this slicing (I'll need to double check it). But if decoding the whole generated sentence works for you, that's totally fine to just go without truncation.
Ah I see thanks!
Thanks for the awesome paper and repo!
I was trying out LLaVA and noticed that the model predictions were always empty strings, and was able to narrow it down to this line https://github.com/ys-zong/VL-ICL/blob/6ad043d625f8205b75f57e720441cfc98d0fd5a1/utils/model_inference.py#L76.
It seems LLaVA already outputs only the tokens it generated, and not the whole context tokens + generated tokens.
The fix is quite easy, just decode the whole generated sentence, without truncating it first with the input_token_len.
Is my thinking correct or am I missing something?