guardicore / monkey

Infection Monkey - An open-source adversary emulation platform
https://www.guardicore.com/infectionmonkey/
GNU General Public License v3.0
6.68k stars 786 forks source link

Investigate plugins using multiprocessing on Windows #2563

Closed mssalvatore closed 2 years ago

mssalvatore commented 2 years ago

Spike

Objective

Discover and document any risks when using multiprocessing for plugins on Windows

Scope

Python's multiprocessing library behaves differently on Windows and Linux. Prove that the multiprocessing approach behaves as expected on Windows. Document the differences and make the necessary modifications to the prototype.

Output

ilija-lazoroski commented 2 years ago

Multiprocessing on Linux and Windows (Mac works the same as Windows) is different and that difference is not minor. It comes with some boundaries for which we need to keep an eye.

Blog post: https://www.pythonforthelab.com/blog/differences-between-multiprocessing-windows-and-linux/ Date: June 13, 2020

Forked vs. Spawn processes

First, comes the difference between how processes are started. On Linux when we start a child process, the process is Forked which means that the child process inherits the memory state from the parent process. While on Windows ( and by default on Mac) when we start a child process. the process is Spawned which means that a new interpreter starts and the code reruns.

Example

import multiprocessing as mp
from time import sleep

print('Before defining simple_func')

def simple_func():
    print('Starting simple func')
    sleep(1)
    print('Finishing simple func')

if __name__ == '__main__':
    p = mp.Process(target=simple_func)
    p.start()
    print('Waiting for simple func to end')
    p.join()

Linux output:

image

Windows output:

image

Mac output (Same as Windows):

image

Notice the second Before defining simple_func when running on Windows (Mac).

Now the printing may not be terrifying but what if we have some calculation or data modification:

import multiprocessing as mp
import random

val = random.random()

def simple_func():
    print(val)

if __name__ == '__main__':
    print('Before multiprocessing: ')
    simple_func()
    print('After multiprocessing:')
    p = mp.Process(target=simple_func)
    p.start()
    p.join()

Linux output:

image

Windows output:

image

Lets see what happens if we using classes

We will see the case where try and write to a file in the system.

import multiprocessing as mp

class MyClass:
    def __init__(self, i):
        self.i = i
        self.file = open(f'{i}.txt', 'w')

    def simple_method(self):
        print('This is a simple method')
        print(f'The stored value is: {self.i}')

    def mp_simple_method(self):
        self.p = mp.Process(target=self.simple_method)
        self.p.start()

    def wait(self):
        self.p.join()
        self.file.close()

if __name__ == '__main__':
    my_class = MyClass(1)
    my_class.mp_simple_method()
    my_class.wait()

Linux output:

image

Windows output:

image

Spawning the process using classes means that the object must be pickable. If we have a class or attribute of a class that is not pickable then we can't spawn a process.

Make Linux work like Windows (just for easy testing) 🥸

We can use a method from multiprocessing so we can replicate how Windows are spawning processes by using [multiprocessing.set_start_method](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.set_start_method)('spawn'). The class example on Linux but using spawn:

image

Linux can become Windows but other way around is not possible. The multiprocessing documentation for set_start_method shows that we can use fork, spawn and forkserver as starting methods.

Now I would guess that fork will create Forked processes in Windows but the crucial thing is that Windows doesn't support the fork-exec model and it doesn't have anything near fork. On other hand Cygwin has some implementation of fork that we can use on Windows but by the answer on StackOverflow it is sloooooooooow and it would require to run python under Cygwin, in other words run the agent in Cygwin.

Windows output using fork as start method:

image

Understand how these boundaries affect the Agent

All our objects must be serializable(pickable), anything that is not it will not really run on Windows. We must base our implementation and design around Spawned processes.

More resources to read on this topic

  1. https://rhodesmill.org/brandon/2010/python-multiprocessing-linux-windows/
  2. https://stackoverflow.com/questions/42148344/python-multiprocessing-linux-windows-difference
ilija-lazoroski commented 2 years ago

Here is a prototype that uses a different version of pandas for two sets of plugins. We agreed to use Spawned processes which work well on both Linux and Windows for the prototype.

To install the different versions of pandas run install_pandas.sh on Linux and install_pandas.bat on Windows.

multiprocessing-test-spawn.tar.gz