vanna-ai / vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.
https://vanna.ai/docs/
MIT License
12.02k stars 965 forks source link

Support for Adding (sql, question, documentation) pair for training the Vanna model . #705

Open Sghosh1999 opened 3 days ago

Sghosh1999 commented 3 days ago

Support for (SQL, Question, Documentation) Triple Training in Vanna AI

Is your feature request related to a problem? Please describe. Currently, Vanna AI only supports training with question-SQL pairs. However, including documentation alongside these pairs would provide valuable context and improve the model's understanding of the database schema and business logic. This limitation makes it harder to train the model with comprehensive domain knowledge.

Describe the solution you'd like Extend the training functionality to accept three-part training data: SQL query, natural language question, and corresponding documentation. The new API could look like:


vn.train(
   question="What is the average age of our customers?",
   sql="SELECT AVG(age) FROM customers",
   documentation="This query calculates the mean age across all customers in our system. The age field represents the customer's current age in years."
)
MichaelMMeskhi commented 1 day ago

One way would be to extend this method

def add_question_sql(self, question: str, sql: str, **kwargs) -> str:
        question_sql_json = json.dumps(
            {
                "question": question,
                "sql": sql,
            },
            ensure_ascii=False,
        )
        id = str(uuid.uuid4()) + "-sql"
        createdat = kwargs.get("createdat")
        doc = Document(
            page_content=question_sql_json,
            metadata={"id": id, "createdat": createdat},
        )
        self.sql_collection.add_documents([doc], ids=[doc.metadata["id"]])

        return id

to add an additional document item to this tuple