Open rmusser01 opened 3 months ago
Seems like the best thing currently short of fine-tuning/pre-training is asking a model specifically trained to identify confabulations.
Finetuning:
Evals:
LLM As Judge:
Detecting Hallucinations using Semantic Entropy:
Lynx/patronus
Reasoning https://arxiv.org/abs/2407.19813 https://arxiv.org/abs/2408.06195 https://arxiv.org/abs/2407.13481 Output length impacts on reasoning https://arxiv.org/abs/2407.19825
Title.
https://blog.streamlit.io/ai21_grounded_multi_doc_q-a/ https://arxiv.org/html/2404.09129v1 https://arxiv.org/pdf/2406.02543 https://huggingface.co/papers/2406.02543 https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard https://github.com/hamelsmu/ft-drift https://hamel.dev/blog/posts/evals/ https://arxiv.org/abs/2009.01325 https://thetechoasis.beehiiv.com/p/eliminating-hallucinations-robots-imitate-us https://arxiv.org/abs/2407.16604 https://eugeneyan.com/writing/finetuning/ https://eugeneyan.com/writing/evals/ https://arxiv.org/pdf/2404.12272 https://arize.com/blog/breaking-down-evalgen-who-validates-the-validators/ https://eugeneyan.com/writing/finetuning/ https://www.nature.com/articles/s41586-024-07421-0 https://arxiv.org/abs/2406.15927 https://research.google/blog/halva-hallucination-attenuated-language-and-vision-assistant/ https://arxiv.org/abs/2407.08488v1 https://osu-nlp-group.github.io/AttributionBench/ https://arxiv.org/abs/2407.08488 https://arxiv.org/pdf/2407.03651 https://www.patronus.ai/blog/lynx-state-of-the-art-open-source-hallucination-detection-model https://arxiv.org/abs/2408.07852 https://www.turingpost.com/p/10-ways-to-process-long-context