vanna-ai / vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.
https://vanna.ai/docs/
MIT License
11.34k stars 898 forks source link

PG Vector not working #659

Open edlouth opened 3 weeks ago

edlouth commented 3 weeks ago

Describe the bug I have setup pg_vector like so:

from vanna.pgvector import PG_VectorStore
from vanna.openai import OpenAI_Chat

class CustomVanna(PG_VectorStore, OpenAI_Chat):
    def __init__(self, config=None):
        PG_VectorStore.__init__(self, config=config)
        OpenAI_Chat.__init__(self, config=config)

vn = CustomVanna(config={
    "api_key": openai_api_key,
    "connection_string": connection_string
})

# The information schema query may need some tweaking depending on your database. This is a good starting point.
df_information_schema = vn.run_sql("SELECT * FROM `...demo.INFORMATION_SCHEMA.COLUMNS`;")

# This will break up the information schema into bite-sized chunks that can be referenced by the LLM
plan = vn.get_training_plan_generic(df_information_schema)

vn.train(plan=plan)

I get the following error:

AttributeError: 'CustomVanna' object has no attribute 'documentation_collection'

If I run:

vn.ask(question="How many users are there?")

I get an error of object of type 'coroutine' has no len()

edlouth commented 3 weeks ago

@andreped I thought I'd mention you as it looks like you are the author of these changes.

andreped commented 3 weeks ago

Hello, @edlouth :] Thank you for reporting the bug!

I was surprised the PR I made was merged so quickly. I don't think we did thorough testing on it, especially not integration tests. We should aim to fix this before the new release is out.

But corutine ha no len() is likely because of a missing await or accidentally making something async which shouldn't be.

I can draft a PR on this today :]

edlouth commented 3 weeks ago

@andreped thanks for getting back so soon.

I have opened a PR https://github.com/vanna-ai/vanna/pull/660 which I think has the changes in question.

andreped commented 3 weeks ago

I have opened a PR #660 which I think has the changes in question.

OK, great! Then I can review and test it on my local setup :]


EDIT: Yeah, your proposed changes makes sense. I have an async implementation of this for another project, and I think I just mixed the two, as this implementation currently have to remain sync.

edlouth commented 3 weeks ago

While we are here I have also had some issues with

https://github.com/vanna-ai/vanna/blob/c21d8bf8c086a9c52ab6178f03cf4eee41051b08/src/vanna/pgvector/pgvector.py#L32

I think this works

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

# Define a custom embedding class with the necessary methods
class CustomEmbeddingFunction:
        def __init__(self, model):
                self.model = model

        def embed_documents(self, texts):
                # Return embeddings for documents
                return self.model.encode(texts, convert_to_tensor=False)

        def embed_query(self, text):
                # Return embedding for a query
                return self.model.encode([text], convert_to_tensor=False)[0]

        self.embedding_function = CustomEmbeddingFunction(model)
andreped commented 3 weeks ago

I think this works

Again, we do exactly this for another project :P But if I recall correctly, you should be able to provide your own custom embedding function through config, so we do not need to do any code changes for that, or?

VirendraSttl commented 2 weeks ago

Hi @edlouth, could you please guide me on how to utilize vanna.pgvector? I attempted to install Vanna with pgvector using pip install 'vanna[pgvector]' but encountered an issue where Vanna 0.7.3 does not offer the extra 'pgvector'. I intended to utilize my local database (pgvector) for vector storage.

andreped commented 2 weeks ago

Hi @edlouth, could you please guide me on how to utilize vanna.pgvector? I attempted to install Vanna with pgvector using pip install 'vanna[pgvector]' but encountered an issue where Vanna 0.7.3 does not offer the extra 'pgvector'. I intended to utilize my local database (pgvector) for vector storage.

Hello, @VirendraSttl :]

There has yet to be made a release including the new pgvector support, so you can't use it like this.

A way to install the latest, could be to do something like this below:

pip install git+https://github.com/vanna-ai/vanna.git#egg=vanna[pgvector]

Then again, as this seems broken right now, and has been addressed in the PR by @edlouth, but has yet to be merged, I would install Vanna through the following to get proper pgvector support:

pip install git+https://github.com/edlouth/vanna.git@pgvector_fixes#egg=vanna[pgvector]

At least something like that should work.

isaacdalmarco commented 1 week ago

Hi @andreped, I am trying to use pgvector as a custom vector database and I have copied the pgvector to my project. But when it starts i get this error:

../.pyenv/versions/3.12.6/lib/python3.12/site-packages/langchain_postgres/vectorstores.py", line 106, in _get_embedding_collection_store from pgvector.sqlalchemy import Vector # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No module named 'pgvector.sqlalchemy'; 'pgvector' is not a package

I tried to run in python 3.12 and 3.11, changed pgvector and langchain_postgres versions, but none worked.

Which version of python are you using? Do you have any other suggestion?

andreped commented 1 week ago

@isaacdalmarco I think we should wait till PR https://github.com/vanna-ai/vanna/pull/660 is merged, as the code in the main branch for the pgvector implementation is broken.

Then we can try to see how to fix this issue of yours, if it is still an issue after merge.

andreped commented 19 hours ago

@edlouth, @isaacdalmarco A new release of vanna==0.7.4 was just released (see here), which should have resolved the original issue and maybe the one @isaacdalmarco is having.

Could you try to upgrade to the latest version and test if this new version works well with you? :]