patx / pickledb

pickleDB is an open source key-value store using Python's json module.
https://patx.github.io/pickledb
BSD 3-Clause "New" or "Revised" License
922 stars 125 forks source link

Use of Threading #56

Open shikharsngh opened 5 years ago

shikharsngh commented 5 years ago

Dumping the key-value pair from dictionary to disk has been done by creating a new thread from main thread. However, since the process is running on a single core and only one thread can be run concurrently, it is as good as dumping it to disk from the main thread.

If however, making of a new thread is replaced by a new process using fork, this might be of help while writing to disk. This is because the new process(child) will have the task of dumping the key value to disk, meanwhile the parent process would continue to run alongside. This will ensure consistency as even if the parent process crashes due to some errors, the child process will ensure that the key-value pair is written to the disk and is always consistent.

This consistency issue is not taken care by multi-threading because if the process crashes due to some errors, all the threads of the process will be killed and the key-value may might not be completely written to disk, making the disk inconsistent.

@patx please let me know your thoughts.

sammck commented 3 years ago

Just read this. It's not entirely accurate, though. While Python does have a global interpreter lock (GIL), which prevents Python code from executing in more than one thread at a time, this lock is released during most calls into native code that block, including I/O (e.g., reading from or writing to disk), allowing other Python threads to run Python code. For threads that are mostly I/O bound, Python multithreading works well and the single core never needs to sit idle waiting for I/O.

It is true that the code that does the pickling/serialization prior to writing to disk can become CPU bound. Attempting to do that part after forking might be risky in a process that has multiple threads running, since the forked process will only have one thread, and any locks held by other threads at the time of fork will never be released in the forked process, which can lead to deadlocks if you are not careful.