vanna-ai / vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.
https://vanna.ai/docs/
MIT License
11.04k stars 862 forks source link

Sql generated with additional Chinese #351

Closed njalan closed 5 months ago

njalan commented 5 months ago

Sometime there are Chinese at the end of Sql : image

zainhoda commented 5 months ago

@njalan could you provide some additional context here on the setup (LLM, training data, etc)

njalan commented 5 months ago

@zainhoda I am using Baichuan-7B as LLM and training consists of querion-sql pairs and documentation for business knowledge(it is mixed with Chinese and englist.) All the question are asked by Chinese

njalan commented 5 months ago

Sometimes it provided me the same two query splitted by -- 或者(it means OR in Englist) So is any any prompt to avoid it?

zainhoda commented 5 months ago

I think the easiest path might be to override the extract_sql method:

https://github.com/vanna-ai/vanna/blob/main/src/vanna/base/base.py#L126-L150

You can provide your own extract_sql method that will remove the unnecessary characters