vanna-ai / vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.
https://vanna.ai/docs/
MIT License
9.12k stars 673 forks source link

Save the training model to local disk, then reload from the local file to avoid the re-training process everytime. #477

Closed yiouyou closed 4 weeks ago

yiouyou commented 1 month ago

Can't find the instruction from documents about how to :

  1. Save the training model to local disk
  2. Then reload from the local file
  3. To avoid starting the training process everytime

I'm using ChromaDB with the config:

_chroma_config = {
    'path': '.',
    'client': 'persistent', # persistent, in-memory
    'n_results': 10,
}

Even with the 'persistent' option, it won't stop the re-training process everytime when I start the app. Could anyone help me about the issue?

Thanks a lot~

tomercagan commented 1 month ago

I am doing this using built-in LocalContext_OpenAI and it saves/load the chroma db:

from vanna.local import LocalContext_OpenAI

chroma_path = # get the path you want, e.g. os.environ.get("VANNA_CHROMA_PATH", "./vanna-db")
vn = LocalContext_OpenAI(
    config={
        "api_key": "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",
        "model": "gpt-4o",
        "path": chroma_path,
    }
)

When using the LocalContext_OpenAI, it just passes the config to both LLM (OpenAI_Chat) and ChromaDB (ChromaDB_VectorStore) vanna wrappers. In the latter, it expects a path config entry which is passed to the chrome client.

You can see that in ./src/vanna/chromadb/chromadb_vector.py. But I think by default it will create the db in the CWD and will reload it from there even if you don't specify a path.

If you are using your own implementation of Vanna, you will have to do something similar and pass path to ChromaDB_VectorStore (assuming you are using it)