sourcegraph / cody

Type less, code more: Cody is an AI code assistant that uses advanced search and codebase context to help you write and fix code.
https://cody.dev
Apache License 2.0
2.83k stars 308 forks source link

End-to-end Cody quality evaluation #537

Closed novoselrok closed 9 months ago

novoselrok commented 1 year ago

We need a way to evaluate the effect of our changes to Cody. This project aims to collect the various questions and queries we are using to test Cody's responses manually and consolidate them in a test suite.

Here is how we would define a test case:

addTestCase('Sourcegraph frontend feature flags', {
    codebase: 'github.com/sourcegraph/sourcegraph',
    context: 'embeddings',
    transcript: [
        {
            question: 'How are feature flags used in sourcegraph frontend?',
            facts: ['useFeatureFlag', 'FeatureFlagName', 'featureFlags.ts', '/site-admin/feature-flags'],
            answerSummary: 'Feature flags allow developers to ship new features that are hidden behind a flag.',
        },
        {
            question: 'How can I add a new one?',
            facts: ['FeatureFlagName'],
            answerSummary: 'Add a new case to the `FeatureFlagName` union type.',
        },
    ],
})

A test case consists of the following:

Facts, hallucination detection, and an LLM judge alone are not enough to verify the quality of Cody's answers. Hopefully, by combining them, we will get a clearer picture.

Tasks

Phase 1 (Foundations, CLI)

Phase 2 (Scale)

Phase 3 (Nice-to-have)

novoselrok commented 1 year ago

Phase 1 done in https://github.com/sourcegraph/cody/pull/590

github-actions[bot] commented 10 months ago

This issue is marked as stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed automatically in 5 days.