microsoft / lida

Automatic Generation of Visualizations and Infographics using Large Language Models
https://microsoft.github.io/lida/
MIT License
2.6k stars 266 forks source link

[Idea] Vector DBs to extend dataset sizes #29

Open ethanabowen opened 10 months ago

ethanabowen commented 10 months ago

Love this tool! A limitation that I've run across is the token limits of LLMs when working with real-world large datasets.

I'd love to be able to point this tool to a Vector DB to extend the amount of data being worked on.

What do we think? Would this really solve the problem I've facing with token limits or is there a general limitation to LIDA because of LLM token sizes?

ethanabowen commented 10 months ago

After more insight into the code, it looks like the limitation I was facing was based on the Summarization of datasets with many many columns. Still a limitation that I'd be interested in overcoming.

lawyinking commented 6 months ago

do you have any solutions or suggestions now?