Talked with Stefan yesterday and we agreed that sdsc would benefit from using parallel processing. This issue is just for the record and to have it documented.
Some definitions:
GIL is for Global Interpreter Lock. The GIL is necessary because the Python interpreter is not thread safe.
This means that there is a globally enforced lock when trying to safely access Python objects from within threads. At any one time only a single thread can acquire a lock for a Python object or C API. The interpreter will reacquire this lock for every 100 bytecodes of Python instructions and around (potentially) blocking I/O operations. Because of this lock CPU-bound code will see no gain in performance when using the Threading library, but it will likely gain performance increases if the Multiprocessing library is used.
Possible solution would include different modules:
Eliminates most needs for synchronization primitives unless if you use shared memory (instead, it's more of a communication model for IPC)
Child processes are interruptible/killable
Python multiprocessing module includes useful abstractions with an interface much like threading.Thread
A must with cPython for CPU-bound processing
Cons
IPC a little more complicated with more overhead (communication model vs. shared memory/objects)
Larger memory footprint
Threading
Pros
Lightweight - low memory footprint
Shared memory - makes access to state from another context easier
Allows you to easily make responsive UIs
cPython C extension modules that properly release the GIL will run in parallel
Great option for I/O-bound applications
Cons
cPython - subject to the GIL
Not interruptible/killable
If not following a command queue/message pump model (using the Queue module), then manual use of synchronization primitives become a necessity (decisions are needed for the granularity of locking)
Code is usually harder to understand and to get right - the potential for race conditions increases dramatically
According to masnun, he answers the question of when to use what:
if io_bound:
if io_very_slow:
print("Use Asyncio")
else:
print("Use Threads")
else:
print("Multi Processing")
CPU Bound => Multi Processing
I/O Bound, Fast I/O, Limited Number of Connections => Multi Threading
Talked with Stefan yesterday and we agreed that sdsc would benefit from using parallel processing. This issue is just for the record and to have it documented.
Some definitions:
GIL is for Global Interpreter Lock. The GIL is necessary because the Python interpreter is not thread safe.
This means that there is a globally enforced lock when trying to safely access Python objects from within threads. At any one time only a single thread can acquire a lock for a Python object or C API. The interpreter will reacquire this lock for every 100 bytecodes of Python instructions and around (potentially) blocking I/O operations. Because of this lock CPU-bound code will see no gain in performance when using the Threading library, but it will likely gain performance increases if the Multiprocessing library is used.
Possible solution would include different modules:
From SO:
Multiprocessing
Pros
multiprocessing
module includes useful abstractions with an interface much likethreading.Thread
Cons
Threading
Pros
Cons
Queue
module), then manual use of synchronization primitives become a necessity (decisions are needed for the granularity of locking)According to masnun, he answers the question of when to use what: