Improvement: Add Evaluation tests for LLMs

rmusser01 commented 6 months ago

This issue is now to track the implementation of various evaluation methods and workflows for LLMs.

Evaluations:

[x] G-Eval
[ ] PingPong
[ ] InfiniteBench
[ ] Ruler
[ ] MMLU
[ ] MMLU-Pro
[ ] ?

As a user, I would like to be informed about the summarization effectiveness of my chosen LLM endpoint.

I would like to be able to evaluate an endpoint against a known, tested framework, to evaluate the accuracy of generated summaries, so that I may increased confidence in the returned results.

Two different approaches to text summarization, abstractive and extractive. Extractive - extract text and use that as summary Abstract - Create novel description of text

LLMs used abstractive, and as far as I can tell from the public research, blow everything else out of the water. :/

This issue is specifically to track creation and implementation of a generalized benchmark test process of LLM summarization through QAG.

Why LLMs vs other summarization approaches? (Will add papers as I come across them) https://www.mdpi.com/2673-4591/59/1/194

https://cookbook.openai.com/examples/evaluation/how_to_eval_abstractive_summarization https://arxiv.org/pdf/2303.16634 Review - https://www.sciencedirect.com/science/article/pii/S2949719124000189 Review - https://arxiv.org/pdf/2403.02901 Review - https://www.researchgate.net/publication/374492453_Exploring_the_Landscape_of_Automatic_Text_Summarization_A_Comprehensive_Survey

Links: https://docs.confident-ai.com/docs/metrics-summarization https://stackoverflow.com/questions/9879276/how-do-i-evaluate-a-text-summarization-tool https://www.confident-ai.com/blog/a-step-by-step-guide-to-evaluating-an-llm-text-summarization-task https://www.confident-ai.com/blog/a-gentle-introduction-to-llm-evaluation https://www.confident-ai.com/blog/llm-evaluation-metrics-everything-you-need-for-llm-evaluation https://docs.useanything.com/guides-and-faq/llm-not-using-my-docs https://aws.amazon.com/blogs/machine-learning/techniques-for-automatic-summarization-of-documents-using-language-models/ https://mattyyeung.github.io/deterministic-quoting https://prollm.toqan.ai/leaderboard https://hamel.dev/blog/posts/evals/ https://twitter.com/langchainai/status/1775569294241472810 https://news.ycombinator.com/item?id=38353285 https://scholar.google.com/scholar?q=related:Y-Hx-kplbEUJ:scholar.google.com/&scioq=&hl=en&as_sdt=0,43 https://news.ycombinator.com/item?id=39982362 https://arxiv.org/abs/2404.01261 https://openreview.net/forum?id=7Ttk3RzDeu https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00632/119276/Benchmarking-Large-Language-Models-for-News

LLMs to track:

https://huggingface.co/jondurbin https://huggingface.co/Tostino/Inkbot-13B-8k-0.2 https://huggingface.co/migtissera/Tess-M-v1.3 https://huggingface.co/TheBloke/Tess-M-v1.3-GGUF https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties WizardLM-1.0-Uncensored-Llama2-13B-GPTQ openhermes-2.5-mistral-7b openhermes-2.5-mistral-13b chronos007-70b guanaco-65B Airoboros-l2-70b zephyr-beta-7b and zephyr-7b https://huggingface.co/refuelai/Llama-3-Refueled

rmusser01 commented 6 months ago

Improving the prompt/other summarization prompt samples:

https://gist.github.com/Tostino/4ba4e7e7988348134a7256fd1cbbf4ff

<#system#>
Your main objective is to condense the content of the document into a concise summary, capturing the main points and themes.
<#chat#>
<#user#>
Please read the provided Original section to understand the context and content. Use this understanding to generate a summary of the Original section. Separate the article into chunks, and sequentially create a summary for each chunk. Focus on summarizing the Original section, ignoring any details about sponsorships/advertisements in the text.

Summarized Sections:
1. For each chunk, provide a concise summary. Start each summary with "Chunk (X of Y):" where X is the current chunk number and Y is the total number of chunks.

To craft a Final Summary:
1. Read the Summarized Sections: Carefully review all the summarized sections you have generated. Ensure that you understand the main points, key details, and essential information from each section.
2. Identify Main Themes: Identify the main themes and topics that are prevalent throughout the summarized sections. These themes will form the backbone of your final summary.
3. Consolidate Information: Merge the information from the different summarized sections, focusing on the main themes you have identified. Avoid redundancy and ensure the consolidated information flows logically.
4. Preserve Essential Details: Preserve the essential details and nuances that are crucial for understanding the document. Consider the type of document and the level of detail required to capture its essence.
5. Draft the Final Summary: After considering all the above points, draft a final summary that represents the main ideas, themes, and essential details of the document. Start this section with "Final Summary:"

Ensure that your final output is thorough, and accurately reflects the document’s content and purpose.

https://gist.github.com/Tostino/cacb1cecdf2eb7386baf565d157f56a0

<#system#>
Your main objective is to condense the content of the document into a concise summary, capturing the main points and themes.
<#chat#>
<#user#>
Please read the provided Original section to understand the context and content. Use this understanding to generate a summary of the Original section, incorporating relevant details and maintaining coherence with the Prior Summary.

Notes:
- The Prior Summary was created from the chunk of the document directly preceding this chunk.
- Ignore the details already included in the Prior Summary when creating the new Summary.
- Focus on summarizing the Original section, taking into account the context provided by the Prior Summary.
- Ignore any details about sponsorships/advertisements in the text.
<#user_context#>
Prior Summary:

https://gist.github.com/Tostino/81eeee9781e519044950332b4e64bef1

<#system#>
Your main objective is to condense the content of the document into a concise summary, capturing the main points and themes.
<#chat#>
<#user#>
To craft a Final Summary:

1. Read Summarized Sections: Carefully review all the summarized sections of the document. Ensure that you have a clear understanding of the main points, key details, and essential information presented in each section.
2. Identify Main Themes: As you go through the summarized sections, identify the main themes and topics that are prevalent throughout the document. Make a list of these themes as they will form the backbone of your final summary.
3. Consolidate Information: Merge the information from the different summarized sections, focusing on the main themes you have identified. Avoid redundancy and ensure that the consolidated information flows logically.
4. Preserve Essential Details: While consolidating, ensure that you preserve the essential details and nuances that are crucial for understanding the document. Consider the type of document and the level of detail required to accurately capture its essence.
5. Check for Completeness: After drafting the final summary, review it to ensure that it accurately represents the main ideas, themes, and essential details of the document.

Please remember to be thorough, and ensure that the final summary is a true reflection of the document’s content and purpose.
<#user_context#>
Summarized Sections:

rmusser01 commented 3 months ago

This deserves its own Milestone.

rmusser01 commented 3 weeks ago

https://github.com/openai/simple-evals/blob/main/simpleqa_eval.py

rmusser01 / tldw

Improvement: Add Evaluation tests for LLMs #29