hegelai / prompttools

Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).
http://prompttools.readthedocs.io
Apache License 2.0
2.56k stars 216 forks source link

Robustness evaluation #69

Open steventkrawczyk opened 11 months ago

steventkrawczyk commented 11 months ago

🚀 The feature

Request from potential user: "There are two main aspects, 1) adjusting prompts that changing semantic words does not trigger hallucination, 2) the prompt itself is such that LLM doesnt slip away from instruction"

Idea: for (1) use prompt templates to substitute words, run evals to check semantic similarity of all results. For (2) use auto-evaluation given instruction, prompt, and response to determine if the LLM followed instructions.

Motivation, pitch

We got this request from a potential user, and also robustness is a common concern in LLM evaluation

Alternatives

No response

Additional context

No response

RigvedRocks commented 6 months ago

Hey, I am working on the issue and I have generated 2 sample scripts - one involving prompt substitution and the other for auto evaluation. I am using Promptbench in both scripts. Can you please guide me as to how to integrate the scripts into your experiments? I am joining your discord group wherein we can discuss this issue in detail.