Closed novoselrok closed 9 months ago
Phase 1 done in https://github.com/sourcegraph/cody/pull/590
This issue is marked as stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed automatically in 5 days.
We need a way to evaluate the effect of our changes to Cody. This project aims to collect the various questions and queries we are using to test Cody's responses manually and consolidate them in a test suite.
Here is how we would define a test case:
A test case consists of the following:
Facts, hallucination detection, and an LLM judge alone are not enough to verify the quality of Cody's answers. Hopefully, by combining them, we will get a clearer picture.
Tasks
Phase 1 (Foundations, CLI)
createClient
function.Phase 2 (Scale)
Phase 3 (Nice-to-have)