Questions about the IM_START and IM_END tokens

shikiw / OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

MIT License

244 stars 22 forks source link

Thanks for your appreciation and sorry for the misunderstanding!

These two tokens do not refer to the special token <image_start> and ```

```. Actually, they denote the location indexes of the first image token and the last image token. Here "image tokens" mean all of tokens that are relevant with the input image (if the model has `````` and ``` ```, then include them). ```START_INDEX_of_IMAGE_TOKENS``` = `````` ```END_INDEX_of_IMAGE_TOKENS``` = `````` For example, suppose we have the prompt like "`````` `````` `````` `````` ```Is``` ```_there``` ```_a``` ```_cat``` ```?```", the location indexes can be specified as: key_position = { "image_start": 1, "image_end": 3, "response_start": 9, } I hope this could be helpful for you :)

shikiw / OPERA

Questions about the IM_START and IM_END tokens #2