Open guninder opened 1 month ago
hi, I can help you with this problem.
iharshlalakiya, thank you. Will appreciate. Please let me know if you need any other information.
@guninder is this issue resolved ? if not then use the below code:
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
below code creates the embedding of your query
query = embeddings.embed_query("Who is Andrew?")
it then performs the vector similarity search against the embedding generated for the query.
results = vectorStore. similarity_search_by_vector_with_relevance_scores(embedding=query, k=3)
@gauravprasadgp , Thanks but that doesn't seem to be right. I had tried externalizing OpenAIEmbeddings function and using the same by importing it while writing and reading. I believe default model is text-embedding-ada-002. If i use any other, it throws exception. Most likely this is a version problem.
@guninder, Try this below code:
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
import os
import logging
#Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
CHROMA_PATH = "data/chroma_wp"
os.environ["OPENAI_API_KEY"] = "your-api-key-here" # Replace with your API key
def create_vector_store():
try:
# 1. Load the document
logger.info("Loading document...")
loader = TextLoader("books/war_and_peace.txt", encoding="utf-8")
documents = loader.load()
logger.info(f"Loaded {len(documents)} documents")
# 2. Split the documents
logger.info("Splitting documents into chunks...")
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200, separator="\n")
chunks = text_splitter.split_documents(documents)
logger.info(f"Created {len(chunks)} chunks")
# 3. Create embeddings and store in Chroma
logger.info("Creating embeddings and storing in Chroma...")
embeddings = OpenAIEmbeddings()
vector_store = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory=CHROMA_PATH
)
# 4. Persist the database
logger.info("Persisting the database...")
vector_store.persist()
logger.info(f"Database persisted to {CHROMA_PATH}")
return vector_store
except Exception as e:
logger.error(f"Error in create_vector_store: {str(e)}")
raise
def search_vector_store():
try:
# 1. Check if database exists
if not os.path.exists(CHROMA_PATH):
logger.error(f"Database directory {CHROMA_PATH} does not exist!")
return
logger.info("Initializing embeddings...")
embeddings = OpenAIEmbeddings()
# 2. Load the persisted database
logger.info("Loading the persisted database...")
vector_store = Chroma(
persist_directory=CHROMA_PATH,
embedding_function=embeddings
)
# 3. Get collection info
collection = vector_store._collection
logger.info(f"Collection count: {collection.count()}")
# 4. Perform the search
query = "Who is Andrew?"
logger.info(f"Performing search with query: '{query}'")
results = vector_store.similarity_search(query=query, k=3)
# 5. Print results
if results:
logger.info(f"Found {len(results)} results")
for i, doc in enumerate(results, 1):
logger.info(f"Result {i}:")
logger.info(f"Content: {doc.page_content[:200]}...")
logger.info(f"Metadata: {doc.metadata}")
else:
logger.warning("No results found!")
return results
except Exception as e:
logger.error(f"Error in search_vector_store: {str(e)}")
raise
def main():
# Create database directory if it doesn't exist
os.makedirs(CHROMA_PATH, exist_ok=True)
# First time: create and populate the database
if not os.listdir(CHROMA_PATH):
logger.info("Creating new vector store...")
create_vector_store()
# Search the database
logger.info("Searching the vector store...")
results = search_vector_store()
return results
if __name__ == "__main__":
main()
Checked other resources
Example Code
Hi, I am new to langchain and chroma. I am trying to insert data into chromadb and search it. There is no issue with data. I tried the same search in creating a knowledge base in bedrock. I don't get any error. The database created (data_level0.bin is about 6.3 MB) but while doing a search, it returns empty results. Following is the code to insert the data.
I get empty results.
Following are the packages i am using
langchain 0.3.1
langchain-chroma 0.1.4
langchain-community 0.3.1
langchain-core 0.3.6
langchain-experimental 0.3.2
langchain-openai 0.2.1
langchain-text-splitters 0.3.0
chroma-hnswlib 0.7.6
chromadb 0.5.12
Error Message and Stack Trace (if applicable)
No exception. Just empty results.
Description
Hi, I am new to langchain and chroma. I am trying to insert data into chromadb and search it. There is no issue with data. I tried the same search in creating a knowledge base in bedrock. I don't get any error. The database created (data_level0.bin is about 6.3 MB) but while doing a search, it returns empty results. Following is the code to insert the data.
I get empty results.
Following are the packages i am using
langchain 0.3.1
langchain-chroma 0.1.4
langchain-community 0.3.1
langchain-core 0.3.6
langchain-experimental 0.3.2
langchain-openai 0.2.1
langchain-text-splitters 0.3.0
chroma-hnswlib 0.7.6
chromadb 0.5.12
System Info
System Information
OS: Windows OS Version: 10.0.22631 Python Version: 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)]
Package Information
langchain_core: 0.3.6 langchain: 0.3.1 langchain_community: 0.3.1 langsmith: 0.1.129 langchain_chroma: 0.1.4 langchain_experimental: 0.3.2 langchain_openai: 0.2.1 langchain_text_splitters: 0.3.0
Optional packages not installed
langgraph langserve
Other Dependencies
aiohttp: 3.10.6 async-timeout: 4.0.3 chromadb: 0.5.12 dataclasses-json: 0.6.7 fastapi: 0.115.0 httpx: 0.27.2 jsonpatch: 1.33 numpy: 1.26.4 openai: 1.50.1 orjson: 3.10.7 packaging: 24.1 pydantic: 2.9.2 pydantic-settings: 2.5.2 PyYAML: 6.0.2 requests: 2.32.3 SQLAlchemy: 2.0.35 tenacity: 8.5.0 tiktoken: 0.7.0 typing-extensions: 4.12.2