ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models

This repository collects all relevant resources about interpretability in LLMs
Creative Commons Zero v1.0 Universal
219 stars 14 forks source link

Add "Internal Consistency and Self-Feedback in Large Language Models: A Survey" #9

Closed fan2goa1 closed 1 month ago

fan2goa1 commented 1 month ago

Hi, I would like to introduce our recent survey paper! Hope you can merge this pull request if you find it useful!

Improving the reasoning ability of large language models (LLMs) and mitigating the hallucination problem are crucial research topics. After extensive thought, we realized that these two issues, "enhancing reasoning" and "alleviating hallucinations," share the same fundamental nature. We approached these issues from the perspective of internal consistency. This perspective allowed us to unify many seemingly unrelated works into a single framework. To improve internal consistency (which in turn enhances reasoning ability and mitigates hallucinations), we identified common elements across various works and summarized them into a Self-Feedback framework.

This framework consists of three components: Self-Evaluation, Internal Consistency Signal, and Self-Update.

This summarized framework has the potential to encompass many works, as illustrated in the following diagram.

ICSF-fig

Additionally, we have derived several important insights through experiments and analysis, such as the "hourglass internal consistency evolution rule," "consistent (almost) equals correct," and the "implicit vs. explicit reasoning paradox."

In summary, we have unified many works from the perspectives of internal consistency and self-feedback, providing inspiration for future researchers and standardizing work in this field.

Relevant links: