run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.87k stars 5.09k forks source link

[Bug]: Connection with chromadb (ChromaVectorStore) returns a type error #13885

Closed aganiezgoda closed 4 months ago

aganiezgoda commented 4 months ago

Bug Description

I've been trying to persist the vector part of my property graph.

from llama_index.core.graph_stores import SimplePropertyGraphStore
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

client = chromadb.PersistentClient("./chroma_db")
collection = client.get_or_create_collection("my_graph_vector_db")

index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    graph_store=SimplePropertyGraphStore(),
    vector_store=ChromaVectorStore(collection=collection),
    show_progress=True,
)

index.storage_context.persist(persist_dir="./storage")

The error reads:

client = chromadb.HttpClient(host=host, port=port, ssl=ssl, headers=headers) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxx.venv\Lib\site-packages\chromadb__init__.py", line 178, in HttpClient port = int(port) ^^^^^^^^^ TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

Version

0.10.40

Steps to Reproduce

See above.

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 4 months ago

To resolve the TypeError when trying to persist the vector part of your property graph using ChromaVectorStore, you need to ensure that the persist_dir parameter is correctly specified. Here is an example of how to do it:

from llama_index.core.graph_stores import SimplePropertyGraphStore
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

# Initialize the Chroma client with persistence
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("my_graph_vector_db")

# Set up the ChromaVectorStore
vector_store = ChromaVectorStore(chroma_collection=collection)

# Create the PropertyGraphIndex
index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    graph_store=SimplePropertyGraphStore(),
    vector_store=vector_store,
    show_progress=True,
)

# Persist the storage context
index.storage_context.persist(persist_dir="./storage")

Ensure that the persist_dir parameter is provided when initializing the PersistentClient. This should help avoid the TypeError related to the port parameter being None [1][2][3].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 4 months ago

@aganiezgoda you have a typo

vector_store = ChromaVectorStore(chroma_collection=collection)

Although, I think ChromaVectorStore needs to be updated to handle duplicate IDs 🤔 (I got some error about that)

aganiezgoda commented 4 months ago

(chroma_collection=collection)

Do you mean (collection=collection) should be (chroma_collection=collection)?

I've tried both and it neither works.

logan-markewich commented 4 months ago

@aganiezgoda worked fine for me here https://colab.research.google.com/drive/1tLBvXNYbX_yK_6pJNwwPb5xpFi9Z3Nsk?usp=sharing

aganiezgoda commented 4 months ago

@aganiezgoda worked fine for me here https://colab.research.google.com/drive/1tLBvXNYbX_yK_6pJNwwPb5xpFi9Z3Nsk?usp=sharing

I'm receiving:

storage_context.vector_store = vector_store
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: property 'vector_store' of 'StorageContext' object has no setter

The only difference vs. what you do is: I'm using the Azure versions of the embedding and llm model. So it's actually:

llm = AzureOpenAI(
    engine="xxx",
    model="gpt-35-turbo-16k",
    temperature=0.0,
    azure_endpoint="https://xx.openai.azure.com/",
    api_key="xxxx",
    api_version="2023-07-01-preview",
)

embeddings = AzureOpenAIEmbedding(
    engine = "xxx",
    model = "text-embedding-ada-002",
    azure_endpoint="https://xxx.openai.azure.com/",
    api_key="xxx", 
    api_version="2023-12-01-preview",
    )

index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=embeddings,
    llm = llm,
    graph_store=SimplePropertyGraphStore(),
    vector_store=ChromaVectorStore(chroma_collection=collection),
    show_progress=True,
)

It works when I don't persist the graph and vector so it should make no difference.

logan-markewich commented 4 months ago

Update your library, this error was fixed

pip install -U llama-index-core

aganiezgoda commented 4 months ago

The update solves the issue.