Open pseudotensor opened 1 year ago
https://h2oai.slack.com/archives/C050PKJ6GAX/p1689022427466269
For SQL, we can try flan-t5-xxl or Flan-UL2 and fine-tuning it on the Bird dataset. Those models are known to do surprisingly well compared to much larger models: Even t5-3B does reasonably: https://bird-bench.github.io/#:~:text=BIRD%20(BIg%20Bench%20for%20LaRge,total%20size%20of%2033.4%20GB. https://declare-lab.net/instruct-eval/ https://medium.com/@bnjmn_marie/behind-the-hype-models-based-on-t5-2019-still-better-than-vicuna-alpaca-mpt-and-dolly-6c4f1139f39e
Note that flan models have 2048 input context and 512 output context for sequence to sequence. Should be enough for many cases, although fine-tuning on different output context length is possible.
https://huggingface.co/datasets/wikisql https://paperswithcode.com/dataset/kaggledbqa
Uhh, I realise this is a dump for enabling this functionality but seeing as h20gpt integrates with langchain, thought I would post this here.
https://python.langchain.com/docs/use_cases/qa_structured/sql
Are there any plans to implement any functionality to support SQL databases? Langchain integrates with SQLAlchemy from my understanding so you could provide support for various databases and let the user supply the connection string or credentials and host?
There are no immediate plans to enable, but a PR is welcome :)
There is also a PR still WIP for elastic search that seems to function, just needs exposed in UI etc. https://github.com/h2oai/h2ogpt/pull/656
https://arxiv.org/abs/2306.03341 https://arxiv.org/abs/2212.14024
https://bird-bench.github.io/
https://dev.to/ngonidzashe/speak-your-queries-how-langchain-lets-you-chat-with-your-database-p62 https://github.com/imartinez/privateGPT/issues/616
https://musings.yasyf.com/compressgpt-decrease-token-usage-by-70/
https://github.com/vnk8071/E2E-AI-Chatbot
https://github.com/dorianbrown/rank_bm25
https://github.com/ocrmypdf/OCRmyPDF
https://github.com/h2oai/helium/issues/8
https://cloud.google.com/blog/products/ai-machine-learning/how-to-use-grounding-for-your-llms-with-text-embeddings
https://github.com/styczynski/chatdb https://github.com/chat2db/Chat2DB
https://arxiv.org/abs/2306.03901
https://github.com/questdb/questdb