vanna-ai / vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.
https://vanna.ai/docs/
MIT License
9.97k stars 737 forks source link

Error in SQL Syntax using MSSQL server (previously reported) #334

Closed dhait closed 3 months ago

dhait commented 3 months ago

Copied from https://github.com/vanna-ai/vanna/issues/310 THIS WAS PREVIOUSLY REPORTED AND CLOSED WITHOUT A FIX

Describe the bug I'm using Vanna connected to a sample Microsoft SQL server and I'm having some trouble with the SQL generated by Vanna when asking for X top Customers or Y top songs or Z top sales. The generated SQL always uses function LIMIT which is correct when using SQLite or MySQL, but it should use SELECT TOP instead when using SQL server. This causes an error, of course. I tried setting SQL example queries using the correct format in the training data but even when I ask one of the questions that I've set as an example it always tries to use LIMIT instead of SELECT TOP. I believe it is incompatible with MSSQL. Any ideas?

To Reproduce Steps to reproduce the behavior: vn.train( question="Who are the top 10 customers?", sql="SELECT top 10 C.CUSTOMERID, C.FirstName, C.LastName, SUM(I.TOTAL) FROM CUSTOMER C JOIN INVOICE I ON C.CUSTOMERID = I.CUSTOMERID GROUP BY C.CUSTOMERID, C.FirstName, C.LastName ORDER BY SUM(I.TOTAL) DESC")

vn.generate_sql(quest="Who are the top 10 customers?")

answer

Expected behavior SELECT top 10 C.CUSTOMERID, C.FirstName, C.LastName, SUM(I.TOTAL) FROM CUSTOMER C JOIN INVOICE I ON C.CUSTOMERID = I.CUSTOMERID GROUP BY C.CUSTOMERID, C.FirstName, C.LastName ORDER BY SUM(I.TOTAL) DESC

Actual behaviour SELECT t.Name AS TrackName, SUM(il.Quantity) AS TotalQuantitySold FROM Track t INNER JOIN InvoiceLine il ON t.TrackId = il.TrackId GROUP BY t.Name ORDER BY TotalQuantitySold DESC LIMIT 10

Error / LOGS Couldn't run sql: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near 'LIMIT'. (102) (SQLExecDirectW)") [SQL: SELECT t.Name AS TrackName, SUM(il.Quantity) AS TotalQuantitySold FROM Track t INNER JOIN InvoiceLine il ON t.TrackId = il.TrackId GROUP BY t.Name ORDER BY TotalQuantitySold DESC LIMIT 10] (Background on this error at: https://sqlalche.me/e/20/f405)

Desktop (please complete the following information where):

OS: Windows Version: 11 Python: [3.11.8] Vanna: [2.8.0] Additional context Add any other context about the problem here.

zainhoda commented 3 months ago

As far as we've been able to determine, this appears to happen when you're using Chinook as a sample dataset. We believe that this is happening because the foundational models have a bunch of pre-training relating to Chinook that are tuned to other SQL flavors so the pre-training is effectively overriding the in-context learning.

If you're doing a demo for SQL Server, the recommendation is to use AdventureWorks: https://learn.microsoft.com/en-us/sql/samples/adventureworks-install-configure?view=sql-server-ver16&tabs=ssms