Add "Internal Consistency and Self-Feedback in Large Language Models: A Survey"

Hi, I would like to introduce our recent survey paper! Hope you can merge this pull request if you find it useful!

Improving the reasoning ability of large language models (LLMs) and mitigating the hallucination problem are crucial research topics. After extensive thought, we realized that these two issues, "enhancing reasoning" and "alleviating hallucinations," share the same fundamental nature. We approached these issues from the perspective of internal consistency. This perspective allowed us to unify many seemingly unrelated works into a single framework. To improve internal consistency (which in turn enhances reasoning ability and mitigates hallucinations), we identified common elements across various works and summarized them into a Self-Feedback framework.

This framework consists of three components: Self-Evaluation, Internal Consistency Signal, and Self-Update.

Self-Evaluation: Responsible for evaluating the model's internal consistency based on its language expressions, decoding layer probability distributions, and hidden states.
Internal Consistency Signal: Through Self-Evaluation, we can obtain numerical, textual, external, and even comparative signals.
Self-Update: Using these signals, we can update the model's expressions or even the model itself to improve internal consistency.

This summarized framework has the potential to encompass many works, as illustrated in the following diagram.

ICSF-fig

Additionally, we have derived several important insights through experiments and analysis, such as the "hourglass internal consistency evolution rule," "consistent (almost) equals correct," and the "implicit vs. explicit reasoning paradox."

In summary, we have unified many works from the perspectives of internal consistency and self-feedback, providing inspiration for future researchers and standardizing work in this field.

Relevant links:

Hugging Face: https://huggingface.co/papers/2407.14507
arXiv: https://arxiv.org/abs/2407.14507
PDF: https://arxiv.org/pdf/2407.14507
GitHub: https://github.com/IAAR-Shanghai/ICSFSurvey
Paper List (WIP): https://www.yuque.com/zhiyu-n2wnm/ugzwgf/gmqfkfigd6xw26eg

ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models

Add "Internal Consistency and Self-Feedback in Large Language Models: A Survey" #9