langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
93.63k stars 15.09k forks source link

Chroma similarity_search and similarity_search_with_score do not return any results #27273

Open guninder opened 6 days ago

guninder commented 6 days ago

Checked other resources

Example Code

Hi, I am new to langchain and chroma. I am trying to insert data into chromadb and search it. There is no issue with data. I tried the same search in creating a knowledge base in bedrock. I don't get any error. The database created (data_level0.bin is about 6.3 MB) but while doing a search, it returns empty results. Following is the code to insert the data.

from langchain_chroma import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
import os

CHROMA_PATH = "data/chroma_wp"
os.environ["OPENAI_API_KEY"] = "sk-"

loader = TextLoader("books/war_and_peace.txt", encoding="utf-8")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200, separator="\n")

chunks = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
vectorStore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=CHROMA_PATH)```

Following is the code i am using to search.
```import chromadb
import os
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

os.environ["OPENAI_API_KEY"] = "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"
CHROMA_PATH = "data/chroma_wp"

embeddings = OpenAIEmbeddings()
vectorStore = Chroma(persist_directory=CHROMA_PATH, embedding_function=embeddings)
#vectorStore.delete()
print(vectorStore)

results = vectorStore.similarity_search("Who is Andrew?", k=3)
#vectorStore.similarity_search_with_score("Who is Andrew?", k=3)

print(results)

I get empty results.

Following are the packages i am using

langchain 0.3.1

langchain-chroma 0.1.4

langchain-community 0.3.1

langchain-core 0.3.6

langchain-experimental 0.3.2

langchain-openai 0.2.1

langchain-text-splitters 0.3.0

chroma-hnswlib 0.7.6

chromadb 0.5.12

Error Message and Stack Trace (if applicable)

No exception. Just empty results.

Description

Hi, I am new to langchain and chroma. I am trying to insert data into chromadb and search it. There is no issue with data. I tried the same search in creating a knowledge base in bedrock. I don't get any error. The database created (data_level0.bin is about 6.3 MB) but while doing a search, it returns empty results. Following is the code to insert the data.

from langchain_chroma import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
import os

CHROMA_PATH = "data/chroma_wp"
os.environ["OPENAI_API_KEY"] = "sk-"

loader = TextLoader("books/war_and_peace.txt", encoding="utf-8")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200, separator="\n")

chunks = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
vectorStore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=CHROMA_PATH)```

Following is the code i am using to search.
```import chromadb
import os
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

os.environ["OPENAI_API_KEY"] = "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"
CHROMA_PATH = "data/chroma_wp"

embeddings = OpenAIEmbeddings()
vectorStore = Chroma(persist_directory=CHROMA_PATH, embedding_function=embeddings)
#vectorStore.delete()
print(vectorStore)

results = vectorStore.similarity_search("Who is Andrew?", k=3)
#vectorStore.similarity_search_with_score("Who is Andrew?", k=3)

print(results)

I get empty results.

Following are the packages i am using

langchain 0.3.1

langchain-chroma 0.1.4

langchain-community 0.3.1

langchain-core 0.3.6

langchain-experimental 0.3.2

langchain-openai 0.2.1

langchain-text-splitters 0.3.0

chroma-hnswlib 0.7.6

chromadb 0.5.12

System Info

System Information

OS: Windows OS Version: 10.0.22631 Python Version: 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)]

Package Information

langchain_core: 0.3.6 langchain: 0.3.1 langchain_community: 0.3.1 langsmith: 0.1.129 langchain_chroma: 0.1.4 langchain_experimental: 0.3.2 langchain_openai: 0.2.1 langchain_text_splitters: 0.3.0

Optional packages not installed

langgraph langserve

Other Dependencies

aiohttp: 3.10.6 async-timeout: 4.0.3 chromadb: 0.5.12 dataclasses-json: 0.6.7 fastapi: 0.115.0 httpx: 0.27.2 jsonpatch: 1.33 numpy: 1.26.4 openai: 1.50.1 orjson: 3.10.7 packaging: 24.1 pydantic: 2.9.2 pydantic-settings: 2.5.2 PyYAML: 6.0.2 requests: 2.32.3 SQLAlchemy: 2.0.35 tenacity: 8.5.0 tiktoken: 0.7.0 typing-extensions: 4.12.2

iharshlalakiya commented 3 days ago

hi, I can help you with this problem.

guninder commented 3 days ago

iharshlalakiya, thank you. Will appreciate. Please let me know if you need any other information.