rlancemartin / auto-evaluator

Evaluation tool for LLM QA chains
https://autoevaluator.langchain.com/
1.06k stars 95 forks source link

Auto-evaluator :brain: :memo:

Note See the HuggingFace space for this app: https://huggingface.co/spaces/rlancemartin/auto-evaluator

Note See the hosted app: https://autoevaluator.langchain.com/

Note Code for the hosted app is also open source: https://github.com/langchain-ai/auto-evaluator

This is a lightweight evaluation tool for question-answering using Langchain to:

Run as Streamlit app

pip install -r requirements.txt

streamlit run auto-evaluator.py

Inputs

num_eval_questions - Number of questions to auto-generate (if the user does not supply an eval set)

split_method - Method for text splitting

chunk_chars - Chunk size for text splitting

overlap - Chunk overlap for text splitting

embeddings - Embedding method for chunks

retriever_type - Chunk retrieval method

num_neighbors - Neighbors for retrieval

model - LLM for summarization of retrieved chunks

grade_prompt - Prompt choice for model self-grading

Blog

https://blog.langchain.dev/auto-eval-of-question-answering-tasks/

UI

image

Disclaimer

You will need an OpenAI API key with access to `GPT-4` and an Anthropic API key to take advantage of all of the default dashboard model settings. However, additional models (e.g., from Hugging Face) can be easily added to the app.