Closed mssalvatore closed 2 years ago
Multiprocessing on Linux and Windows (Mac works the same as Windows) is different and that difference is not minor. It comes with some boundaries for which we need to keep an eye.
Blog post: https://www.pythonforthelab.com/blog/differences-between-multiprocessing-windows-and-linux/ Date: June 13, 2020
First, comes the difference between how processes are started. On Linux when we start a child process, the process is Forked
which means that the child process inherits the memory state from the parent process. While on Windows ( and by default on Mac) when we start a child process. the process is Spawned
which means that a new interpreter starts and the code reruns.
import multiprocessing as mp
from time import sleep
print('Before defining simple_func')
def simple_func():
print('Starting simple func')
sleep(1)
print('Finishing simple func')
if __name__ == '__main__':
p = mp.Process(target=simple_func)
p.start()
print('Waiting for simple func to end')
p.join()
Linux output:
Windows output:
Mac output (Same as Windows):
Notice the second Before defining simple_func
when running on Windows (Mac).
Now the printing may not be terrifying but what if we have some calculation or data modification:
import multiprocessing as mp
import random
val = random.random()
def simple_func():
print(val)
if __name__ == '__main__':
print('Before multiprocessing: ')
simple_func()
print('After multiprocessing:')
p = mp.Process(target=simple_func)
p.start()
p.join()
Linux output:
Windows output:
We will see the case where try and write to a file in the system.
import multiprocessing as mp
class MyClass:
def __init__(self, i):
self.i = i
self.file = open(f'{i}.txt', 'w')
def simple_method(self):
print('This is a simple method')
print(f'The stored value is: {self.i}')
def mp_simple_method(self):
self.p = mp.Process(target=self.simple_method)
self.p.start()
def wait(self):
self.p.join()
self.file.close()
if __name__ == '__main__':
my_class = MyClass(1)
my_class.mp_simple_method()
my_class.wait()
Linux output:
Windows output:
Spawning the process using classes means that the object must be pickable. If we have a class or attribute of a class that is not pickable then we can't spawn a process.
We can use a method from multiprocessing so we can replicate how Windows are spawning processes by using [multiprocessing.set_start_method](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.set_start_method)('spawn')
. The class example on Linux but using spawn
:
Linux can become Windows but other way around is not possible. The multiprocessing documentation for set_start_method
shows that we can use fork
, spawn
and forkserver
as starting methods.
Now I would guess that fork
will create Forked processes in Windows but the crucial thing is that Windows doesn't support the fork-exec model and it doesn't have anything near fork. On other hand Cygwin has some implementation of fork that we can use on Windows but by the answer on StackOverflow it is sloooooooooow and it would require to run python under Cygwin, in other words run the agent in Cygwin.
Windows output using fork
as start method:
All our objects must be serializable(pickable), anything that is not it will not really run on Windows.
We must base our implementation and design around Spawned
processes.
Here is a prototype that uses a different version of pandas for two sets of plugins. We agreed to use Spawned
processes which work well on both Linux and Windows for the prototype.
To install the different versions of pandas run install_pandas.sh
on Linux and install_pandas.bat
on Windows.
Spike
Objective
Discover and document any risks when using multiprocessing for plugins on Windows
Scope
Python's multiprocessing library behaves differently on Windows and Linux. Prove that the multiprocessing approach behaves as expected on Windows. Document the differences and make the necessary modifications to the prototype.
Output