Attention map plotting - Githubissues

Hi,

Thanks for your appreciation! Sorry for the poor writing of our paper that leads to your misunderstanding!

This is a good question! In fact, what we want to claim is "the appearance of aggregation patterns in the context makes it more easily to induce hallucinated contents in the subsequent tokens", but not "all subsequent tokens of the aggregation pattern should be hallucinations", they are different. As we stated in the paper, aggregation patterns are the nature of LLM, while this nature become the cause of current MLLM's hallucination (because the vision tokens will be gradually attenuated in the information flow, especially when the context gets much longer).

Now, you may understand the key point is the existence of aggregation pattern but not which token does the aggregation pattern located. The reasons of the difference between your visualization and our Figure 2 are complicated, might be different machine, different environment, or different sequences (I notice that you directly copy the IntructBLIP's answer in Figure 2, however, this answer is not complete since we omitted part of the sentence with the ellipsis). In general, we don't care which token does the pattern appear, maybe it appears at "_him" or "_from" at the next time, it is not surprising (btw, according to our observation, the pattern is more likely to appear at ".", "'", and "\n"). We just care that, if the hallucinations are more likely to be induced when more and more patterns appear in the context.

I hope this helps you well :)

shikiw / OPERA

Attention map plotting #10