Robustness evaluation - Githubissues

hegelai / prompttools

Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).

Apache License 2.0

2.56k stars 216 forks source link

🚀 The feature

Request from potential user: "There are two main aspects, 1) adjusting prompts that changing semantic words does not trigger hallucination, 2) the prompt itself is such that LLM doesnt slip away from instruction"

Idea: for (1) use prompt templates to substitute words, run evals to check semantic similarity of all results. For (2) use auto-evaluation given instruction, prompt, and response to determine if the LLM followed instructions.

Motivation, pitch

We got this request from a potential user, and also robustness is a common concern in LLM evaluation

Alternatives

No response

Additional context