shikiw / OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
MIT License
215 stars 17 forks source link

Does the method to find `knowledge aggregation pattern` have any relevant papers to reference in NLP domain? #26

Closed shanpoyang654 closed 1 month ago

shanpoyang654 commented 2 months ago

Hello, thank you for your great work!

I'm confused about the knowledge aggregation pattern and anchor token. Does it mean the number of paragraph which contains different meanings in LLM'response ?

As to the example below, the "knowledge aggregation pattern" is 2? The image features a blue bowl filled with a delicious mixture of bananas, nuts, and oatmeal. The bowl is placed on a dining table, and a spoon is resting inside the bowl, ready to be used for enjoying the meal. In addition to the bowl of food, there are a few other items on the table. A bottle can be seen on the left side of the table, while a cup is positioned towards the top right corner. A book is also present on the right side of the table, adding to the cozy atmosphere of the scene.

Does the method to find "knowledge aggregation pattern" have any relevant papers to reference in NLP domain?

I'm wondering if this method can be transferred to judge Transition in Semantics in LLM's response.

Thank you for your time! And hope for your reply~

(My email is shanpoyang@mail.ustc.edu.cn)

shikiw commented 2 months ago

Thanks for your interest! Your questions are quite valuable.

Q1: Does it mean the number of paragraph which contains different meanings in LLM'response ? A1: No. Anchor token is the nature of LLM, you can find more explanations in [1].

Q2: As to the example below, the "knowledge aggregation pattern" is 2? A2: No. Sorry I can hardly get what does 2 mean. Could you clarify it more clearly?

Q3: Does the method to find "knowledge aggregation pattern" have any relevant papers to reference in NLP domain? A3: Yes. Please refer to [1, 2, 3].

Q4: I'm wondering if this method can be transferred to judge Transition in Semantics in LLM's response. A4: Good point! I think it is a potential direction to figure out how anchor tokens help model preserve the logic in sequence.

[1] Label words are anchors: An information flow perspective for understanding in-context learning [2] Vision transformers need registers [3] An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models