vanna-ai / vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.
https://vanna.ai/docs/
MIT License
11.04k stars 862 forks source link

Vanna trulens performance metrics #238

Open samoliverschumacher opened 8 months ago

samoliverschumacher commented 8 months ago

This PR adds a script to support improving the performance (accuracy, cost and latency) of a vanna app.

The Problem;

Context;

vn.ask() carries out RAG in multiple steps that can all be optimised;

  1. Retrieve examples of 3 different data types (SQL, DDL etc.)
    • parameters: embedding model chosen, retrieval system, retrieval parameters
  2. Connects to LLM model
    • parameters: model chosen, fine-tune vs not.
  3. Prompts the LLM about each of these in different ways.

Further improvements to vanna in the future could open up even more possibilities like;

The solution;

A script implements trulens-eval that allows configuration of what is to be evaluated, and how. It presents the results in a dashboard (see the doc for visuals)

Evaluation of the system using TruLens allows evaluation without changing vanna (just adding a log to the vanna model). Alternatives could be to include evaluation in the app's code itself, this might require major refactoring to decouple the vanna components.

Other evaluation frameworks exist, though not many as of yet.

Tests performed

Manual/hand testing only, and only used a few example prompts (shown in the code). No unit tests