Error while loading saved index in chroma db #2491

nishanthc-nd commented 1 year ago

persist_directory = 'chroma_db_store/index/' or 'chroma_db_store' docsearch = Chroma(persist_directory=persist_directory, embedding_function=embeddings) query = "Hey" docs = docsearch.similarity_search(query)

NoIndexException: Index not found, please create an instance before querying

Folder structure chroma_db_store:

sergerdn commented 1 year ago

Should work:

import logging
import os

import chromadb
from dotenv import load_dotenv
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma


ABS_PATH = os.path.dirname(os.path.abspath(__file__))
DB_DIR = os.path.join(ABS_PATH, "db")

def get_documents():
    return PyPDFLoader("fixtures/pdf/MorseVsFrederick.pdf").load()

def init_chromadb():
    client_settings = chromadb.config.Settings(
    embeddings = OpenAIEmbeddings()

    vectorstore = Chroma(

    vectorstore.add_documents(documents=get_documents(), embedding=embeddings)

def query_chromadb():
    client_settings = chromadb.config.Settings(

    embeddings = OpenAIEmbeddings()

    vectorstore = Chroma(
    vectorstore.similarity_search_with_score(query="FREDERICK", k=4)

def main():

if __name__ == '__main__':
kavlata commented 1 year ago

I tried the code given by @sergerdn still not working. The place where I am retrieving the persistent storage is in RetrievalQA method.

collection_name="long-docs" persist_directory="/content/sample_data/chromadb/" client_settings = Settings( chroma_db_impl="duckdb+parquet", persist_directory=persist_directory, # Optional, defaults to .chromadb/ in the current directory anonymized_telemetry=False )

vectorstore = Chroma( collection_name=collection_name, embedding_function=embeddings, client_settings=client_settings, persist_directory=persist_directory, )

Then for QA qar = RetrievalQA.from_chain_type(llm=local_llm, chain_type="stuff", retriever = vectorstore.as_retriever(), chain_type_kwargs=chain_type_kwargs,return_source_documents=True)

The vectorstore here is not accessible. I have persisted the db using persist(). Yet I see this error. Error: NoIndexException: Index not found, please create an instance before querying

Index folder structure : chromadb index chroma-collections.parquet chroma-embeddings.parquet

sergerdn commented 1 year ago


Can you confirm whether you tried to run my code with no modifications and whether it did not work as expected?

kavlata commented 1 year ago

@sergerdn The following code didnt work. I ran this on google colab.

Load DB

import chromadb from chromadb.config import Settings collection_name="long-docs" persist_directory="/content/sample_data/chromadb/" client_settings = Settings( chroma_db_impl="duckdb+parquet", persist_directory=persist_directory, # Optional, defaults to .chromadb/ in the current directory anonymized_telemetry=False )

vectorstore = Chroma( collection_name=collection_name, embedding_function=embeddings, client_settings=client_settings, persist_directory=persist_directory, ) result = vectorstore.similarity_search_with_score(query="contract", k=4) print(result)

Error: NoIndexException: Index not found, please create an instance before querying

sergerdn commented 1 year ago

Please provide me with your full code for reproducing errors, including the code for inserting data into ChromaDB. Additionally, please use backticks (`) when you post your code.

Screenshot 2023-04-06 at 17 23 40

kavlata commented 1 year ago

Thanks @sergerdn for guiding. Python code is below.

from langchain.text_splitter import CharacterTextSplitter, TextSplitter #, NLTKTextSplitter
from llama_index import SimpleDirectoryReader
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
import chromadb
from chromadb.config import Settings

documents = SimpleDirectoryReader('/content/sample_data/',required_exts='.txt').load_langchain_documents() 
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=400,chunk_overlap=20,length_function=len,separators=["\n\n", "\n", " ", ""])
texts = text_splitter.split_documents(documents)
docsearch_db = Chroma.from_documents(texts, embeddings, collection_name=collection_name, persist_directory=persist_directory)

client_settings = Settings(
    persist_directory=persist_directory, # Optional, defaults to .chromadb/ in the current directory

vectorstore = Chroma(
result = vectorstore.similarity_search_with_score(query="contract", k=1)
sergerdn commented 1 year ago

Do not use the same directory for both the Chrome database and documents under any circumstances. I believe they should be in different directories:

documents = SimpleDirectoryReader('/content/sample_data/source_docs/',required_exts='.txt').load_langchain_documents() 

I will look into your code ASAP.

sergerdn commented 1 year ago


Look at my code, I use:

# settings for ChromaDB
 client_settings = chromadb.config.Settings(
# create instance
vectorstore = Chroma(

# add docs
vectorstore.add_documents(documents=get_documents(), embedding=embeddings)

You use another api:

texts = text_splitter.split_documents(documents)

# You don't have a database at the moment. The referee for you has encountered an error.

# We seem to have encountered a bug or an undocumented feature 
# as it does not match the expected behaviour of creating a DB, which should have been already created.
docsearch_db = Chroma.from_documents(
   texts, embeddings, 
   collection_name=collection_name, persist_directory=persist_directory

Could you please use my api and confirm whether it works on your end?

Also, please provide any links to the documentation that you are reading when you write your scripts. This is important because if we have some documented API in the documentation, but it does not work as expected, I believe it is a bug.

sergerdn commented 1 year ago

I have tested my code once again and can confirm that it is working correctly.

kavlata commented 1 year ago

Thanks so very much @sergerdn It works now.

add_documents and then persist() is working. Thanks !!!

The documentation I was referring to is below

Documentation link


sergerdn commented 1 year ago


It seems like we have a bug in the code.

chintan-donda commented 1 year ago

Facing the same issue even after following steps from here. Also installed libs with the exact version as specified.

chromadb.errors.NoIndexException: Index not found, please create an instance before querying

It works if I call like this:

def main():

==> This essentially loads the documents, persist it, and then makes query to vectorstore.

But fails if I restart the Jupyter kernel and run with below code:

def main():

==> This tries to load the vectorstore and makes query to vectorstore.

Any suggestion please, how can I fix?

sergerdn commented 1 year ago


Try creating the directory for the database first.

chintan-donda commented 1 year ago

@sergerdn When to create directory? At the time of init_chromadb() step or query_chromadb()? Can u pls share the sample code snippet?

sergerdn commented 1 year ago

@sergerdn When to create directory? At the time of init_chromadb() step or query_chromadb()? Can u pls share the sample code snippet?

Why not create the directory if it does not already exist before doing anything? What could be the problem? I'm sorry, but I don't understand you.

chintan-donda commented 1 year ago

@sergerdn I'm already creating the directory if not exist, before you suggested. Still the same issue.

sergerdn commented 1 year ago

@sergerdn I'm already creating the directory if not exist, before you suggested. Still the same issue.

Post your code to repeat your problem. Please do not use a Python notebook, give me a script.

sergerdn commented 1 year ago


Tested and it works. Make sure to uncomment init_chromadb() during the first run to create the database with documents. During the second run, only execute query_chromadb.

Please do not use Python notebooks, as they create isolated environments, and you may not have the same environment during a second run.

import json
import logging
import os
import re

import chromadb
from dotenv import load_dotenv
from fastapi.encoders import jsonable_encoder
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma


ABS_PATH = os.path.dirname(os.path.abspath(__file__))
DB_DIR = os.path.join(ABS_PATH, "db")

def replace_newlines_and_spaces(text):
    # Replace all newline characters with spaces
    text = text.replace("\n", " ")

    # Replace multiple spaces with a single space
    text = re.sub(r'\s+', ' ', text)

    return text

def get_documents():
    return PyPDFLoader("fixtures/pdf/MorseVsFrederick.pdf").load()

def init_chromadb():
    if not os.path.exists(DB_DIR):

    client_settings = chromadb.config.Settings(
    embeddings = OpenAIEmbeddings()

    vectorstore = Chroma(
    documents = []
    for num, doc in enumerate(get_documents()):
        doc.page_content = replace_newlines_and_spaces(doc.page_content)

    vectorstore.add_documents(documents=documents, embedding=embeddings)

def query_chromadb():
    if not os.path.exists(DB_DIR):
        raise Exception(f"{DB_DIR} does not exist, nothing can be queried")

    client_settings = chromadb.config.Settings(

    embeddings = OpenAIEmbeddings()

    vectorstore = Chroma(
    result = vectorstore.similarity_search_with_score(query="who is FREDERICK?", k=4)
    jsonable_result = jsonable_encoder(result)
    print(json.dumps(jsonable_result, indent=2))

def main():

if __name__ == '__main__':
chintan-donda commented 1 year ago

@sergerdn Thanks for the code snippet. With exactly your code snippet, it still fails with the same error chromadb.errors.NoIndexException: Index not found, please create an instance before querying. Below is my configuration:

Python 3.9.6
Macbook Pro M1, OS - Venture 13.3.1
chintan-donda commented 1 year ago

It's working fine for me with the below code and lib versions:

import json
import logging
import os
import re
import sys

from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.document_loaders import PyPDFLoader
from fastapi.encoders import jsonable_encoder
from dotenv import load_dotenv


ABS_PATH = os.path.dirname(os.path.abspath(__file__))
DB_DIR = os.path.join(ABS_PATH, "db")

def replace_newlines_and_spaces(text):
    # Replace all newline characters with spaces
    text = text.replace("\n", " ")
    # Replace multiple spaces with a single space
    text = re.sub(r'\s+', ' ', text)
    return text

def get_documents():
    return PyPDFLoader("fixtures/pdf/MorseVsFrederick.pdf").load()

def init_chromadb():
    # Delete existing index directory and recreate the directory
    if os.path.exists(DB_DIR):
        import shutil
        shutil.rmtree(DB_DIR, ignore_errors=True)

    documents = []
    for num, doc in enumerate(get_documents()):
        doc.page_content = replace_newlines_and_spaces(doc.page_content)

    # Split the documents into chunks
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    texts = text_splitter.split_documents(documents)

    # Select which embeddings we want to use
    embeddings = OpenAIEmbeddings()

    # Create the vectorestore to use as the index
    vectorstore = Chroma.from_documents(texts, embeddings, persist_directory=DB_DIR)
    vectorstore = None

def query_chromadb():
    if not os.path.exists(DB_DIR):
        raise Exception(f"{DB_DIR} does not exist, nothing can be queried")

    # Select which embeddings we want to use
    embeddings = OpenAIEmbeddings()
    # Load Vector store from local disk
    vectorstore = Chroma(persist_directory=DB_DIR, embedding_function=embeddings)

    result = vectorstore.similarity_search_with_score(query="who is FREDERICK?", k=4)
    jsonable_result = jsonable_encoder(result)
    print(json.dumps(jsonable_result, indent=2))

def main():

if __name__ == '__main__':

Note: init_chromadb() creates a subdirectory with name index under the DB_DIR. If it doesn't create it then you will get the error chromadb.errors.NoIndexException: Index not found, please create an instance before querying. But with the above code, it'd work without any issue.

Libs installed and their version:

Python 3.9.6
Macbook Pro M1, OS - Venture 13.3.1
notebook not workļ¼Œ you are right

bakongi commented 1 year ago

Try to comment "collection_name="langchain_store"," in query_chromadb()

dsaks9 commented 1 year ago

I used the from_texts method to persist my embeddings, as shown below:

embedder = OpenAIEmbeddings()
db = Chroma.from_texts(texts = embedding_input['texts'],


The persist_directory is populated as shown below:

|--- {dir_name}_chroma |---|--- index |-------|--- id_to_uuid_9f0.... |-------|--- index_9f5... |-------|--- index_metadata_9f0... |-------|--- uuid_to_id_9f0... |---|--- chroma-collections.parquet |---|--- chroma-embeddings.parquet

If I comment the db.persist() line of code the directory remains the same.

I use the following code, based on the previous replies, to fetch the embeddings from the directory

import chromadb
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import Embeddings, OpenAIEmbeddings

collection_name = 'col_name'
dir_name = '/dir/dir1/dir2'

client_settings = chromadb.config.Settings(

embeddings = OpenAIEmbeddings()

db = Chroma(

result = db.similarity_search_with_score(query="profit", k=4)

I still get the following error:

NoIndexException: Index not found, please create an instance before querying

chintan-donda commented 1 year ago

@dsaks9 Create the dir_name directory first before the call to Chroma.

import os
import chromadb
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import Embeddings, OpenAIEmbeddings

collection_name = 'col_name'
dir_name = '/dir/dir1/dir2'

# Delete existing index directory and recreate the directory
if os.path.exists(dir_name):
    import shutil
    shutil.rmtree(dir_name, ignore_errors=True)

client_settings = chromadb.config.Settings(

embeddings = OpenAIEmbeddings()

db = Chroma(

result = db.similarity_search_with_score(query="profit", k=4)

Check if it works?

dsaks9 commented 1 year ago

@chintan-donda thanks for the suggestion, although still running into the same error.

dsaks9 commented 1 year ago

Realized I had just forgotten to include the collection name when first creating the embeddings; once I supply collection_name to the from_texts method the code works properly.

annjawn commented 1 year ago

I am facing a unique use while using Chroma.from_documents Up until yesterday the parquet files were being created but all of a sudden now the the embedding and collections parquet aren't being created at all.

my code is as simple as -

text_splitter = TokenTextSplitter(chunk_size=500, chunk_overlap=0)
texts = text_splitter.split_documents(all_docs)        
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)                            
os.makedirs(self.persist_dir, exist_ok=True)

When i look into the file system i see this


I see that the chroma-embeddings.parquet and chroma-collections.parquet are missing altogether. I am a bit stumped as to why this is happening.

budhewarvijay0407 commented 1 year ago

I used the huggingface model to create the vectorindex in google colab and stored (persisted) the vectordatabase in my drive , now I want to use this drive to get the results when i run it on my local machine

Im usigng the index folder created in my drive when i ran the colab notebook , however when im using it now (local) its throwing me the error for "NoIndexException: Index not found, please create an instance before querying"

Any suggestion?

Seems like its working now , I forgot to use vectordb.persist() to make sure we save the vectordatabse

dankolesnikov commented 1 year ago

If you are still facing Error: NoIndexException: Index not found, my solution was to bump the version of your chromadb package, I belive there was a bug there that is now resolved.

pip install --upgrade chromadb

@nishanthc-nd can you please try and see if we can close this issue?

dankolesnikov commented 1 year ago

my solution was to bump the version of your chromadb package, I believe there was a bug there that is now resolved.

pip install --upgrade chromadb

lexsf commented 1 year ago

I'm getting this as well. As soon as I end the python process, I can no longer read the persisted index. If I create a new chroma client while the process is going on, it seems to read from the persisted dir. But if create a completely new process, it doesn't work.

alberduris commented 1 year ago

Same issue here

lastrei commented 1 year ago

some one can help me my code is bellow:

import re
import uuid

from fastapi import FastAPI, UploadFile
import uvicorn
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from uuid import UUID

app = FastAPI()

DB_DIR = "./db"
embeddings = HuggingFaceEmbeddings(model_name="nghuyong/ernie-3.0-medium-zh")

def replace_newlines_and_spaces(text):
    # Replace all newline characters with spaces
    text = text.replace("\n", " ")
    # Replace multiple spaces with a single space
    text = re.sub(r'\s+', ' ', text)
    return text

def get_documents(file_path):
    return TextLoader(file_path, encoding="utf-8").load()

def init_chromadb(file_path):
    if not os.path.exists(DB_DIR):

    documents = []
    for num, doc in enumerate(get_documents(file_path)):
        doc.page_content = replace_newlines_and_spaces(doc.page_content)

    # Split the documents into chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=20, length_function=len,
                                                   separators=["\n\n", "\n", " ", "", "怂"])
    texts = text_splitter.split_documents(documents)
    # Create the vectorestore to use as the index
    vectorstore = Chroma.from_documents(texts, embeddings, persist_directory=DB_DIR)
async def process_file(file: UploadFile):
        random_name = str(uuid.uuid1())
        index_path = "./upload_file/" + random_name + "/"
        file_path = index_path + file.filename
        if not os.path.exists("./upload_file/" + random_name):
            os.mkdir("./upload_file/" + random_name)
        with open(file_path, "wb") as buffer:


        return {"status": "success"}
    except Exception as e:
        return {"status": "failure", "error": str(e)}

if __name__ == "__main__":, host="", port=7777)

i make a api use langchain, when the api is runing i use db.get() i can see the all the documents but when i close the api , and run db.get() again , there is only one document i first upload. why?..

nishanthc-nd commented 1 year ago

Thanks so very much @sergerdn It works now.

add_documents and then persist() is working. Thanks !!!

The documentation I was referring to is below

Documentation link


This solution seemed to work for me, I am closing this issue.

ShubhamZoro commented 7 months ago

I tried the code given by @sergerdn still not working. The place where I am retrieving the persistent storage is in RetrievalQA method.

collection_name="long-docs" persist_directory="/content/sample_data/chromadb/" client_settings = Settings( chroma_db_impl="duckdb+parquet", persist_directory=persist_directory, # Optional, defaults to .chromadb/ in the current directory anonymized_telemetry=False )

vectorstore = Chroma( collection_name=collection_name, embedding_function=embeddings, client_settings=client_settings, persist_directory=persist_directory, )

Then for QA qar = RetrievalQA.from_chain_type(llm=local_llm, chain_type="stuff", retriever = vectorstore.as_retriever(), chain_type_kwargs=chain_type_kwargs,return_source_documents=True)

The vectorstore here is not accessible. I have persisted the db using persist(). Yet I see this error. Error: NoIndexException: Index not found, please create an instance before querying

Index folder structure : chromadb index chroma-collections.parquet chroma-embeddings.parquet Did you solve this problem can share your repo

Mihika21 commented 4 months ago

load some documents

documents = SimpleDirectoryReader(input_files=['uber_2021.pdf']).load_data()

initialize client, setting path to save data

db = chromadb.PersistentClient(path="./chroma_db")

create collection

chroma_collection = db.get_or_create_collection("quickstart")

assign chroma as the vector_store to the context

vector_store = ChromaVectorStore(chroma_collection=chroma_collection) storage_context = StorageContext.from_defaults(vector_store=vector_store)

create your index

index = VectorStoreIndex.from_documents( documents, storage_context=storage_context ) its giving me this error RuntimeError: Chroma is running in http-only client mode, and can only be run with 'chromadb.api.fastapi.FastAPI' as the chroma_api_impl. see for more information. help me with solution