Open XiaoConstantine opened 2 months ago
Might be a stupid question, but when client has a chromadb as:
store, errNs := chroma.New( chroma.WithChromaURL(localURL), chroma.WithEmbedder(embeder), chroma.WithDistanceFunction("cosine"), chroma.WithNameSpace(sessionString), )
and eventually:
store.AddDocuments(ctx, docs)
Should I expect the rows in the created collection contain
embeddings
? It seems to me currently it's None and I don't seeembeder
being used inAddDocuments
function eitherHere's the output I got query chromadb's collection
{'ids': ['026fc10f-ee40-4247-97eb-18801ded699c' ...., 'embeddings': None, 'metadatas': [{'source': foo....},], 'documents': ["blah.....]
you can see the example chroma-vectorstore-example
@devalexandre Here's the collection I have running the example link you pasted:
Out[6]:
{'ids': ['020afcd9-f07a-4e37-b742-013a58ddf722',
'06868a94-e427-44a9-adff-cec70b00b035',
'1f35898a-6660-4c21-b7f4-0929e115bb80',
'2c2c73e9-0f66-4ac8-b8fb-56edda10e05f',
'5ee1653b-89eb-4bfb-87a0-0d7a18708181',
'61836373-f98b-426a-b9fb-49ebcce8587d',
'790ee176-b7e3-4b51-b46f-6594afa1a364',
'7abda7d1-c9ee-4f56-b56d-f34adc20530f',
'b019a576-b104-4bc9-864f-47907b3fa0cb',
'b17b8e05-0662-49dd-8bd4-587ee7c2206d',
'c4279033-b568-456c-8ca3-d1e7eb7cffbc',
'd0f0415f-fb22-4f6d-b848-47e6a92eab65',
'd7884ce3-95a0-4389-8289-5cac5fd4f2d3'],
'embeddings': None,
'metadatas': [{'area': 1523,
'nameSpace': 'ce41f18c-accd-4b98-8165-362f9406a2e0',
'population': 22.6},
{'area': 707,
'nameSpace': 'ce41f18c-accd-4b98-8165-362f9406a2e0',
'population': 0.04},
{'area': 105,
'nameSpace': 'ce41f18c-accd-4b98-8165-362f9406a2e0',
'population': 11},
{'area': 341,
'nameSpace': 'ce41f18c-accd-4b98-8165-362f9406a2e0',
'population': 1.59},
{'area': 1572,
'nameSpace': 'ce41f18c-accd-4b98-8165-362f9406a2e0',
'population': 9.5},
{'area': 622,
'nameSpace': 'ce41f18c-accd-4b98-8165-362f9406a2e0',
'population': 9.7},
{'area': 918,
'nameSpace': 'ce41f18c-accd-4b98-8165-362f9406a2e0',
'population': 0.42},
{'area': 905,
'nameSpace': 'ce41f18c-accd-4b98-8165-362f9406a2e0',
'population': 1.2},
{'area': 326,
'nameSpace': 'ce41f18c-accd-4b98-8165-362f9406a2e0',
'population': 2.3},
{'area': 203,
'nameSpace': 'ce41f18c-accd-4b98-8165-362f9406a2e0',
'population': 15.5},
{'area': 641,
'nameSpace': 'ce41f18c-accd-4b98-8165-362f9406a2e0',
'population': 6.9},
{'area': 1200,
'nameSpace': 'ce41f18c-accd-4b98-8165-362f9406a2e0',
'population': 13.7},
{'area': 828,
'nameSpace': 'ce41f18c-accd-4b98-8165-362f9406a2e0',
'population': 1.46}],
'documents': ['Sao Paulo',
'Kazuno',
'Paris',
'Fukuoka',
'London',
'Tokyo',
'Toyota',
'Hiroshima',
'Nagoya',
'Buenos Aires',
'Santiago',
'Rio de Janeiro',
'Kyoto'],
'data': None,
'uris': None}
The embedding
section is still empty, tho my understanding is that with a provided openai api key, it will create a embedding function with it and generate embeddings based on documents, seems my understanding is wrong here?
@XiaoConstantine you shoule read Chroma docs: https://docs.trychroma.com/usage-guide#adding-data-to-a-collection https://docs.trychroma.com/troubleshooting#using-get-or-query-embeddings-say-none
If Chroma is passed a list of documents, it will automatically tokenize and embed them with the collection's embedding function (the default will be used if none was supplied at collection creation). Chroma will also store the documents themselves. If the documents are too large to embed using the chosen embedding function, an exception will be raised.
Using .get or .query, embeddings say None This is actually not an error. Embeddings are quite large and heavy to send back. Most application don't use the underlying embeddings and so, by default, chroma does not send them back. To send them back: add include=["embeddings", "documents", "metadatas", "distances"] to your query to return all information.
So you should query Chroma collections with include
, and ChromaVector.SimilaritySearch
function also support WithIncludes
option.
Might be a stupid question, but when client has a chromadb as:
and eventually:
Should I expect the rows in the created collection contain
embeddings
? It seems to me currently it's None and I don't seeembeder
being used inAddDocuments
function eitherHere's the output I got query chromadb's collection