vanna-ai / vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.
https://vanna.ai/docs/
MIT License
12.1k stars 969 forks source link

Set up a token (or words) limit to be sent to the LLM #577

Open Gaket opened 4 months ago

Gaket commented 4 months ago

Is your feature request related to a problem? Please describe. I just ran into a problem when Vanna.ai tried to make a pretty big request to answer a single question:

Extracted SQL: SELECT BARBER_ID, COUNT(*) AS schedule_count
FROM SCHEDULES
GROUP BY BARBER_ID
Using model gpt-4o for **1,349,982.75** tokens (approx)

That's more than a million tokens!

Luckily, OpenAi sent me 429 error: openai.RateLimitError: Error code: 429 - {'error': {'message': 'Request too large for gpt-4o in organization org-2.... on tokens per min (TPM): Limit 30000, Requested 1349985.

Describe the solution you'd like I'd like to be able to pass a parameter specifying the max size in tokens that I'm comfortable with.

Describe alternatives you've considered Setting up a limit at OpenAi - I'm not an admin there.

Additional context I tried to connect Vanna AI to our prod db with 100+ tables and a lot of data. I believe the error could have appeared due to "let Vanna ai send your data to LLM" flag.

zainhoda commented 3 months ago

This is likely due to the vn.generate_summary call. That's the only method that will send the entire dataframe to the LLM.

If you'd like to disable this functionality you can set summarization=False for the web app: https://vanna.ai/docs/web-app/#vanna.flask.VannaFlaskApp

There probably does need to be some kind of option to limit the number of rows sent to the LLM for summarization.