Open Shubhranshu153 opened 1 year ago
I have been experiencing the same thing. For me, it seems to clone the json files contents at the end of the file (basically if you copied everything, and then pasted it again at the end of the json file)
The problem is that you don't use Locks while reading and writing. Because of this it's possible that one thread or process does a 'seek' operation just before another thread or process wants to read.
For example, below is the code for a read using the JSONStorage.
def read(self) -> Optional[Dict[str, Dict[str, Any]]]:
# Get the file size by moving the cursor to the file end and reading
# its location
self._handle.seek(0, os.SEEK_END)
size = self._handle.tell()
if not size:
# File is empty, so we return ``None`` so TinyDB can properly
# initialize the database
return None
else:
# Return the cursor to the beginning of the file
self._handle.seek(0)
# Load the JSON contents of the file
return json.load(self._handle)
self._handle.seek(0, os.SEEK_END)
, the cursor is now at the end of the file.
1.2 proces one does self._handle.seek(0), the cursor is now at the beginning of the file.read
.
2.1 proces two does self._handle.seek(0, os.SEEK_END)
, the cursor is now at the end of the file.json.load(self._handle)
-> you read from the end of a file -> file seems emptyprocess one set the cursor to the beginning, but proces 2 changed it to the end just before process one wants to read, resulting in an empty str. This is one way things can go wrong but you can imagine that there are a lot of ways that this can fail. Like @SpiralAPI saw there might be a proces that does a self._handle.seek(0, os.SEEK_END)
just before a write resulting in appending all the data instead of overwriting.
It's not very usefull to do a search using multiple processes or thread over a single file. Since you would need to use locks every time you read or write, you basically turned it into a synchronous operation.
If you would need to do a CPU-intensive task, it's beter to read everything ahead of time (or at least in chunks) and then pass the data to different processes or threads.
But as i had only parallel read operation, i thought it would work. But when multiple process try to read the db i get a json error
I tried to execute your code (and had to make some small changes), but when I got it working, I didn't get any of the errors you had. As @MrPigss mentioned, your error seems to indicate that you had some process doing both reads and writes from multiple threads within the same program. With multiprocess, reading should work as long as each instance has its own file handle to the database file.
For reference, here's the code I used:
import multiprocessing
from tinydb import TinyDB, Query
# Define the ID you want to search for
search_id = [{"name": "bar1"}] # Replace with your desired ID
# Define a function to perform the search operation
def search_data():
# Create a TinyDB database
db = TinyDB('test.json')
while True:
# Perform a search for the specific ID
result = db.search(Query()['foo'] == search_id)
print(f"Search result for ID {search_id}")
# Close the database
db.close()
if __name__ == '__main__':
# Define the number of processes you want for simultaneous searches
num_processes = 100
# Create and start processes for searching
processes = []
for _ in range(num_processes):
process = multiprocessing.Process(target=search_data)
process.start()
processes.append(process)
try:
# Keep the processes running in the background
for process in processes:
process.join()
except KeyboardInterrupt:
# Terminate the processes gracefully on Ctrl+C
for process in processes:
process.terminate()
And this is the test.json
content:
{"_default": {"1": {"foo": [{"name": "bar1"}]}, "2": {"foo": [{"name": "bar2"}]}, "3": {"foo": [{"name": "bar3"}]}, "4": {"foo": [{"name": "bar4"}]}, "5": {"foo": [{"name": "bar5"}]}, "6": {"foo": [{"name": "bar6"}]}}}
I was doing some testing with tinydb. Its an awesome software. I found that its not recommended for multiprocess reads like usecase in flask etc. But as i had only parallel read operation, i thought it would work. But when multiple process try to read the db i get a json error
To reproduce it: (need to change the db json and search)