Closed samiit closed 3 months ago
You seem to be passing the wrong value to the retriever, you should give it the collection name and not the collection itself. Also if you want to persist the documents locally you should probably use the PersistentClient from chroma module. Here is a minimal working example so you can build on it:
import chromadb
from chromadb.utils import embedding_functions
from dspy.retrieve.chromadb_rm import ChromadbRM
chroma_client = client = chromadb.PersistentClient(path="./furniture_example")
default_ef = embedding_functions.DefaultEmbeddingFunction()
collection = chroma_client.get_or_create_collection(name="furniture", embedding_function=default_ef)
collection.add(
documents=[
"couch, bed, table, chair",
"computer, server, table, chair"],
metadatas=[
{"source": "Bedroom"},
{"source": "Office"}
],
ids=[
"id1",
"id2"
]
)
rm = ChromadbRM(collection_name='furniture', persist_directory="./furniture_example", embedding_function=default_ef)
print(rm('comfy'))
I was trying to follow up on this minimal example without going through OpenAI, using a different embedding function, but it seems that OpenAI is still chosen by default, as it requires authentification. Is there a differnt way to decalre the RM from ChromaDBRM?
rm =ChromadbRM('furniture', "./furniture_example", embedding_functions.SentenceTransformerEmbeddingFunction( model_name="all-MiniLM-L6-v2"), k=3)
@csaiedu Does perhaps this work for you? As per the ChromaDB documentation, “by default, Chroma uses the Sentence Transformers all-MiniLM-L6-v2 model to create embeddings”, which seems to be the embedding function you were using in the example.
from chromadb.utils import embedding_functions
embedding_function = embedding_functions.DefaultEmbeddingFunction()
retrieval_model = ChromadbRM(
collection_name=database_name,
persist_directory=CHROMA_DB_PATH,
embedding_function=embedding_function,
)
When I ran it, I didn’t need an authentification by OpenAI. However, I am also not running into authentification issues with embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2”)
for one of my chroma DB. Could you exclude that the embedding function is the issue ?
Hi Magdalena,
Thank you for your prompt response
When running your code sample, on windows 10, python 3.9 or 3.12, I get TypeError: init() got an unexpected keyword argument 'embedding_function'
Dspy version Name: dspy-ai Version: 2.4.0 Summary: DSPy
Chromadb version Name: chromadb Version: 0.4.24 Summary: Chroma.
Regards
From: Magdalena Lederbauer @.> Sent: Saturday, April 6, 2024 7:04 PM To: stanfordnlp/dspy @.> Cc: csaiedu @.>; Mention @.> Subject: Re: [stanfordnlp/dspy] ChromaDB minimal example (Issue #469)
@csaieduhttps://github.com/csaiedu Does perhaps this work for you? As per the ChromaDB documentation, “by default, Chroma uses the Sentence Transformershttps://www.sbert.net/ all-MiniLM-L6-v2 model to create embeddings”, which seems to be the embedding function you were using in the example.
from chromadb.utils import embedding_functions
embedding_function = embedding_functions.DefaultEmbeddingFunction() retrieval_model = ChromadbRM( collection_name=database_name, persist_directory=CHROMA_DB_PATH, embedding_function=embedding_function, )
When I ran it, I didn’t need an authentification by OpenAI. However, I am also not running into authentification issues with embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2”) for one of my chroma DB. Could you exclude that the embedding function is the issue ?
SCR-20240406-qbeb.png (view on web)https://github.com/stanfordnlp/dspy/assets/98785759/dea48cfb-c194-4a67-8e54-4e019d9c3c42
— Reply to this email directly, view it on GitHubhttps://github.com/stanfordnlp/dspy/issues/469#issuecomment-2041153060, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJD5CFWHZ5P5222CI5BQTSDY4A2KDAVCNFSM6AAAAABD25KFZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGE2TGMBWGA. You are receiving this because you were mentioned.Message ID: @.***>
Never mind,
I checked your code on Linux and that works fine, it's the Windows OS that's still a problem for that library I imagine.
thanks for your help
Kind regards
From: Magdalena Lederbauer @.> Sent: Saturday, April 6, 2024 7:04 PM To: stanfordnlp/dspy @.> Cc: csaiedu @.>; Mention @.> Subject: Re: [stanfordnlp/dspy] ChromaDB minimal example (Issue #469)
@csaieduhttps://github.com/csaiedu Does perhaps this work for you? As per the ChromaDB documentation, “by default, Chroma uses the Sentence Transformershttps://www.sbert.net/ all-MiniLM-L6-v2 model to create embeddings”, which seems to be the embedding function you were using in the example.
from chromadb.utils import embedding_functions
embedding_function = embedding_functions.DefaultEmbeddingFunction() retrieval_model = ChromadbRM( collection_name=database_name, persist_directory=CHROMA_DB_PATH, embedding_function=embedding_function, )
When I ran it, I didn’t need an authentification by OpenAI. However, I am also not running into authentification issues with embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2”) for one of my chroma DB. Could you exclude that the embedding function is the issue ?
SCR-20240406-qbeb.png (view on web)https://github.com/stanfordnlp/dspy/assets/98785759/dea48cfb-c194-4a67-8e54-4e019d9c3c42
— Reply to this email directly, view it on GitHubhttps://github.com/stanfordnlp/dspy/issues/469#issuecomment-2041153060, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJD5CFWHZ5P5222CI5BQTSDY4A2KDAVCNFSM6AAAAABD25KFZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGE2TGMBWGA. You are receiving this because you were mentioned.Message ID: @.***>
No problem; Yes, I ran the code on MacOS; Great that it works now – let us know in case something else comes up!
thanks
On Mon 8 Apr 2024, 11:23 Magdalena Lederbauer, @.***> wrote:
No problem; Yes, I ran the code on MacOS; Great that it works now – let us know in case something else comes up!
— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/dspy/issues/469#issuecomment-2042386678, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJD5CFTVNXC74YNQISEAMIDY4JV2BAVCNFSM6AAAAABD25KFZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSGM4DMNRXHA . You are receiving this because you were mentioned.Message ID: @.***>
Hi everyone
I am trying to create a minimal running example of integrating ChromaDB with DSPy.
The last line goes wrong with the following message:
Any suggestions, or hints at correctly using ChromaDB with DSPy?