rmusser01 / tldw

tl/dw (Too Long, Didn't Watch): Your Personal Research Multi-Tool - a naive attempt at 'A Young Lady's Illustrated Primer'
Apache License 2.0
339 stars 11 forks source link

Ongoing Improvement: Improve Accuracy of Results / Reduce Confabulation Rate #103

Open rmusser01 opened 3 months ago

rmusser01 commented 3 months ago

Title.

https://blog.streamlit.io/ai21_grounded_multi_doc_q-a/ https://arxiv.org/html/2404.09129v1 https://arxiv.org/pdf/2406.02543 https://huggingface.co/papers/2406.02543 https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard https://github.com/hamelsmu/ft-drift https://hamel.dev/blog/posts/evals/ https://arxiv.org/abs/2009.01325 https://thetechoasis.beehiiv.com/p/eliminating-hallucinations-robots-imitate-us https://arxiv.org/abs/2407.16604 https://eugeneyan.com/writing/finetuning/ https://eugeneyan.com/writing/evals/ https://arxiv.org/pdf/2404.12272 https://arize.com/blog/breaking-down-evalgen-who-validates-the-validators/ https://eugeneyan.com/writing/finetuning/ https://www.nature.com/articles/s41586-024-07421-0 https://arxiv.org/abs/2406.15927 https://research.google/blog/halva-hallucination-attenuated-language-and-vision-assistant/ https://arxiv.org/abs/2407.08488v1 https://osu-nlp-group.github.io/AttributionBench/ https://arxiv.org/abs/2407.08488 https://arxiv.org/pdf/2407.03651 https://www.patronus.ai/blog/lynx-state-of-the-art-open-source-hallucination-detection-model https://arxiv.org/abs/2408.07852 https://www.turingpost.com/p/10-ways-to-process-long-context

rmusser01 commented 1 month ago

Seems like the best thing currently short of fine-tuning/pre-training is asking a model specifically trained to identify confabulations.

rmusser01 commented 1 month ago

Finetuning:

Evals:

LLM As Judge:

Detecting Hallucinations using Semantic Entropy:

Lynx/patronus

rmusser01 commented 1 month ago

https://github.com/EdinburghNLP/awesome-hallucination-detection

rmusser01 commented 1 month ago

https://arxiv.org/abs/2407.16557

rmusser01 commented 1 month ago

Reasoning https://arxiv.org/abs/2407.19813 https://arxiv.org/abs/2408.06195 https://arxiv.org/abs/2407.13481 Output length impacts on reasoning https://arxiv.org/abs/2407.19825

rmusser01 commented 1 month ago

https://arxiv.org/abs/2406.10279 https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard

rmusser01 commented 1 month ago

https://arxiv.org/pdf/2409.18475 https://cleanlab.ai/blog/trustworthy-language-model/

rmusser01 commented 1 month ago

https://arxiv.org/abs/2410.02707

rmusser01 commented 2 weeks ago

https://www.lycee.ai/blog/rag-ragallucinations-and-how-to-fight-them

rmusser01 commented 1 week ago

https://github.com/lechmazur/confabulations/

rmusser01 commented 1 week ago

https://llm-editing.github.io/

rmusser01 commented 2 days ago

https://arxiv.org/abs/2410.22071