rmusser01 / tldw

Too Long, Didn't Watch(TL/DW): Your Personal Research Multi-Tool - Open Source NotebookLM
Apache License 2.0
45 stars 2 forks source link

Improvement: Add functionality + UI toggle for 'confabulation-check' summarizations produced #30

Open rmusser01 opened 1 month ago

rmusser01 commented 1 month ago

As a user, I understand that LLMs are not oracles of truth and are prone to confabulations when allowed.

As such, I would like to have a form of 'fact-checking' in place for generated summaries, so that I may have additional confidence in the returned result being accurate.

As a user, when interacting with the application, I would like the option (both CLI arg and UI element/toggle switch) to have any generated summaries, analyzed (by default the same, with option for specifying a different endpoint) and validated to be free of confabulations or falsehoods (as much is reasonably possible).

So that when I press 'summarize', and have the toggle enabled for 'Confabulation-Check', the processing pipeline will compare the generated summary to the original transcript through asking an LLM endpoint, 'Does the content of this match the content of this?' And the resulting answer displayed to the user. Next to this would ideally be a numerical measure for accuracy, that would also be displayed to the user, to say to what generalized degree the summary is free of confabulation.

Like 'LLM is '20/40/60/80/100%' certain it is free of confabulation, and anything below 80% is considered unreliable/false.

Spitballing for now.

https://arxiv.org/abs/2404.12065

Edit: This is going to be a bit more indepth than originally planned.

rmusser01 commented 1 month ago

https://blog.streamlit.io/ai21_grounded_multi_doc_q-a/

rmusser01 commented 3 weeks ago

https://arxiv.org/html/2404.09129v1 https://arxiv.org/pdf/2406.02543 https://huggingface.co/papers/2406.02543 https://github.com/hamelsmu/ft-drift https://hamel.dev/blog/posts/evals/ https://arxiv.org/abs/2009.01325 https://eugeneyan.com/writing/finetuning/ https://eugeneyan.com/writing/evals/ https://arxiv.org/pdf/2404.12272 https://arize.com/blog/breaking-down-evalgen-who-validates-the-validators/ https://eugeneyan.com/writing/finetuning/ https://www.nature.com/articles/s41586-024-07421-0 https://arxiv.org/abs/2406.15927