Open rmusser01 opened 2 months ago
Using an LLM as a Response Judge
Some metrics cannot be defined objectively and are particularly useful for more subjective or complex criteria. We care about correctness, faithfulness, and relevance.
Answer Correctness - Is the generated answer correct compared to the reference and thoroughly answers the user's query?
Answer Relevancy - Is the generated answer relevant and comprehensive?
Answer Factfulness - Is the generated answer factually consistent with the context document?
Title.
Benchmarks: Evaluation Methodologies
Coding Ability
Confabulation Rate
Context Length
Creative Writing
Pop Culture
Reasoning
Role Playing
Summarization
Tool Calling
Toxicity Testing
Vibes