Open gracewzhang opened 11 months ago
🤖
Hey there, @gracewzhang! 👋 I'm Dosu, a friendly bot designed to assist with bugs, answer questions, and help you get going with contributions to LangChain. While we wait for a human maintainer, feel free to ask me anything I might be able to assist with. Let's get this sorted out together!
The error you're encountering, sqlite3.OperationalError: attempt to write a readonly database
, typically occurs when the application does not have write permissions to the database file or directory. In the context of LangChain and the Chroma vector store, this could happen if the persist_directory
specified during the initialization of the Chroma instance is not writable by the application.
In your code, you're removing the persist_directory
and then immediately trying to write to it. This could potentially lead to a race condition where the directory is not fully set up before you're trying to write to it.
if os.path.exists(self.persist_directory):
shutil.rmtree(self.persist_directory)
You might want to add a small delay or a check to ensure the directory is fully deleted before proceeding. Alternatively, you could change the permissions of the persist_directory
to ensure it's writable by your application.
If you're still encountering the issue after checking these potential causes, it would be helpful to know more about the environment in which you're running your application. Are there any other processes that might be accessing the same persist_directory
? Are you able to consistently reproduce this error, or does it only occur intermittently?
As for the __get_new_client()
function, based on the provided code, it seems to be a helper function used to create a new Chroma client. It first checks if the persist_directory
exists and removes it if it does. Then it retrieves the documents, creates a new Chroma client with these documents and the specified embedding_function
, and finally writes the current date to a file in the persist_directory
. This function is used in the __setup_client()
and reingest()
methods to set up or refresh the Chroma client.
Please let me know if you need further assistance.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Tried adding a check to see if self.persist_directory
still exists before calling Chroma.from_documents()
(it doesn't) + created the directory and manually set the permissions for it, but neither fixed the issue.
@gracewzhang whats your chroma db version?
@umair313 0.4.18
@umair313 0.4.18
Try with 0.4.17 or 15. Make sure your internet is good. Change the name of persistence director name. Are you using notebook?
@umair313 0.4.18
Try with 0.4.17 or 15. Make sure your internet is good. Change the name of persistence director name. Are you using notebook?
Just tried with both 0.4.17 & 0.4.15, plus changed the name of the persistence directory name, and I'm still running into the same issue. Not using notebook either. It's weird bc calling __get_new_client()
from setup_client()
works, but calling it from reingest()
doesn't work. I printed the directory at the start of each call and it's the same too.
This issue popped up out of nowhere and to fix it, I had to downgrade ChromaDB to 0.4.14:
venv/bin/python -m pip install --upgrade chromadb==0.4.14
This is weird, because I was definitely using ChromaDB later than the version from October the 10th for some time and the issue only occurred recently:
https://pypi.org/project/chromadb/#history
Maybe this is caused by a transitive dependency of ChromaDB being upgraded in the meantime?
This issue popped up out of nowhere and to fix it, I had to downgrade ChromaDB to 0.4.14:
venv/bin/python -m pip install --upgrade chromadb==0.4.14
This is weird, because I was definitely using ChromaDB later than the version from October the 10th for some time and the issue only occurred recently:
https://pypi.org/project/chromadb/#history
Maybe this is caused by a transitive dependency of ChromaDB being upgraded in the meantime?
This did the trick, thanks!
This issue popped up out of nowhere and to fix it, I had to downgrade ChromaDB to 0.4.14:
venv/bin/python -m pip install --upgrade chromadb==0.4.14
This is weird, because I was definitely using ChromaDB later than the version from October the 10th for some time and the issue only occurred recently:
https://pypi.org/project/chromadb/#history
Maybe this is caused by a transitive dependency of ChromaDB being upgraded in the meantime?
Just ran into the same error and this solved the issue. Thanks!
This issue popped up out of nowhere and to fix it, I had to downgrade ChromaDB to 0.4.14:
venv/bin/python -m pip install --upgrade chromadb==0.4.14
This is weird, because I was definitely using ChromaDB later than the version from October the 10th for some time and the issue only occurred recently:
https://pypi.org/project/chromadb/#history
Maybe this is caused by a transitive dependency of ChromaDB being upgraded in the meantime?
Just ran into this issue and this fixed it
Just ran into this issue and installing the 0.4.14 version didn't fix it...
Ran into this issue and installing the 0.4.14 version didn't fix it. Any more ideas?
Seeing this too sometimes.
I found that when I create a Chroma database with a new name the first time, it works. But if I delete the database directory from my Google Drive filesystem and try to recreate a database with the same name, I get this error when I try to add documents to it. Did anyone else notice the same kind of behaviour? For context, I am using Google Colab notebook which writes to Chroma database saved on my Google Drive.
I encountered the same issue locally on version 0.4.24
, but downgrading to 0.4.14
resolved it.
I traced this issue down to some funky stuff going on in the sqlite3 backend. It seems to be an issue with whenever you do persist directory to recreate a stored vectorstore and running multiple times.
tl;dr, restart your jupyter notebook and do what you can to clear anything that might be causing an "active data connection"
chroma version 0.4.24
works for me, this was not the problem (at least for me).
my code:
def create_index_from_documents(collection_name, embedding_model, persist_directory, all_docs: List[Document], clear_persist_folder: bool = True):
if clear_persist_folder:
pf = Path(persist_directory)
if pf.exists() and pf.is_dir():
print(f"Deleting the content of: {pf}")
shutil.rmtree(pf)
pf.mkdir(parents=True, exist_ok=True)
print(f"Recreated the directory at: {pf}")
print("Generating and persisting the embeddings..")
print(persist_directory)
vectordb = Chroma.from_documents(
collection_name = collection_name,
documents=all_docs,
embedding=embedding_model,
persist_directory=persist_directory # type: ignore
)
vectordb.persist()
return vectordb
some simpler code which should also work:
recreate_db = False
persist_directory = "./chroma_db"
t1_start = time.perf_counter()
if recreate_db:
vectorstore = Chroma.from_documents(
collection_name=collection_name, documents=docs, embedding=embed_model, persist_directory=persist_directory)
vectorstore.persist()
else:
vectorstore = Chroma(collection_name=collection_name, persist_directory=persist_directory, embedding_function=embed_model)
t1_stop = time.perf_counter()
print("elapsed time:", t1_stop-t1_start)
Be sure that when you're experimenting with fixing this problem that you restart your jupyter notebook. I believe this issue may be a problem of trying to write to it while there is still an active connection to the DB (even if you deleted the DB, maybe something in your jupyter notebook still looking there?), which is what gives it the read-only access
Seeing this too sometimes.
Only sometimes?
@Fuehnix You are right. Thank you!
Worked for me. Do you think one could close all active connections programmatically ahead of attempting the persist?
Thanks again
@Fuehnix Except now I run into trying to append to a partially-populated db... (entirely separate issue)
I just updated Chroma
to the latest version and and the error disappeared for me (0.4.22 -> 0.5.0).
$ poetry add chromadb@latest
.
Feel free to update with pip
if you're not using poetry
.
after a half day reading the code, :( here is the correct answer about rebuild chroma database!
'version: chromadb==0.5.0'
def create_client(self):
if self.db_client is None:
local_db_path = self.get_db_file_path()
settings = Settings()
settings.persist_directory = local_db_path
settings.is_persistent = True
settings.allow_reset = True
self.db_client = chromadb.Client(settings=settings)
print(f"create db_client: {self.db_client}")
def create_collection(self):
self.db_collection = self.db_client.get_or_create_collection(name=self.db_name)
print(f"create db_collection:{self.db_collection}")
if self.db_collection:
self.db_client.delete_collection(name=self.db_name)
self.db_collection = None
print('delete_collection success')
if self.db_client:
result = self.db_client.reset()
self.db_client.clear_system_cache() # very important
self.db_client = None
print(f"remove and reset db_client success: {result}")
# delete you persist_directory and create persist_directory againt
then, call step 1 to rebuild chroma.sqlite. maybe add a litte delay in step1 is much better!
I'm getting this issue as well but it happens randomly. If I try to create the dataset with a persistent directory that was already created, but it was removed manually via shutil
in Python, I get this error. But if I keep trying it works after few tries.
EDIT: Downgrading to ==0.4.14
seems to fix it.
I got the same error in a Jupyter notebook and renaming the persistence directory did work.
I hit this bug when writing a vector store to a temporary directory before renaming it to its final directory. Here's the code:
#!/usr/bin/env python
import chromadb
import numpy as np
import os
def write_vs(
documents: list[str],
output_dir: str,
):
print(f"Preparing to create vector store at '{output_dir}'...")
# create the new vector store at a temporary path
# (so nobody accesses it until it's ready)
new_temp_dir = "vs_temp"
print(f"Creating temporary vector store at '{new_temp_dir}' (to be renamed to '{output_dir}')...")
vs_db = chromadb.PersistentClient(new_temp_dir)
collection = vs_db.get_or_create_collection(name="test")
# write "documents" to vector store
ids = documents
embeddings = [list(np.random.normal(size=16)) for doc in documents]
print(f"Adding {len(ids)} documents to '{new_temp_dir}'...")
collection.upsert(ids=ids, documents=ids, embeddings=embeddings)
# rename the temporary vector store to its final path
os.rename(new_temp_dir, output_dir)
print(f"Renamed '{new_temp_dir}' to '{output_dir}'.")
print("")
# write vector store #1 to vs_1/
write_vs(documents=["abc"], output_dir="vs_1")
# write vector store #2 to vs_2/
write_vs(documents=["xyz"], output_dir="vs_2")
When the scripts attempts to write the second vector store to the same temporary directory vs_temp
(even though there's no such directory any more), the following error occurs:
$ rm -rf vs_*; bug.py
Getting ready to create vector store at 'vs_1'...
Creating temporary vector store at 'vs_temp' (to be renamed to 'vs_1')...
Adding 1 documents to 'vs_temp'...
Renamed 'vs_temp' to 'vs_1'.
Getting ready to create vector store at 'vs_2'...
Creating temporary vector store at 'vs_temp' (to be renamed to 'vs_2')...
Adding 1 documents to 'vs_temp'...
Traceback (most recent call last):
File "/path/to/bug.py", line 34, in <module>
write_vs(documents=["xyz"], output_dir="vs_2")
...omitted...
File "/path/to/python3.12/site-packages/chromadb/db/mixins/embeddings_queue.py", line 180, in submit_embeddings
results = cur.execute(sql, params).fetchall()
^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.OperationalError: attempt to write a readonly database
Guys I spent 5 hours trying to fix the bug, it was working fine only when I was restarting the local server between tries and failing otherwise. chromadb==0.4.14 fixed it. Thank you very much!
System Info
Platform: Ubuntu 22.04 Python: 3.11.6 Langchain: 0.0.351
Who can help?
No response
Information
Related Components
Reproduction
When the program is first initialized with
__setup_client()
and__should_reingest()
returnsTrue
,__get_new_client()
works as intended. However, ifreingest()
is called afterward,__get_new_client()
returns the error below.Relevant code:
Error:
Expected behavior
No error returned