python / cpython

The Python programming language
https://www.python.org
Other
63.67k stars 30.5k forks source link

use os.memfd_create in multiprocessing.shared_memory? #81366

Open 78212f07-2619-4b41-b0fc-42fe2922b059 opened 5 years ago

78212f07-2619-4b41-b0fc-42fe2922b059 commented 5 years ago
BPO 37185
Nosy @pitrou, @tiran, @applio, @pierreglaser, @vinay0410

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.9', 'performance'] title = 'use os.memfd_create in multiprocessing.shared_memory?' updated_at = user = 'https://github.com/pierreglaser' ``` bugs.python.org fields: ```python activity = actor = 'christian.heimes' assignee = 'none' closed = False closed_date = None closer = None components = [] creation = creator = 'pierreglaser' dependencies = [] files = [] hgrepos = [] issue_num = 37185 keywords = [] message_count = 4.0 messages = ['344881', '348974', '351443', '351500'] nosy_count = 5.0 nosy_names = ['pitrou', 'christian.heimes', 'davin', 'pierreglaser', 'vinay0410'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'resource usage' url = 'https://bugs.python.org/issue37185' versions = ['Python 3.9'] ```

78212f07-2619-4b41-b0fc-42fe2922b059 commented 5 years ago

Hi, Following https://bugs.python.org/issue26836, I started thinking about using memfd_create instead of shm_open for creating shared-memory segments in multiprocessing.shared_memory.

The main advantage of memfd_create over shm_open is that the generated resources management is easier: a segment created using using memfd_create is released once all references to the segment are dropped. This is not the case for segments created using shm_open, for which additional resource tracking is needed (using the new multiprocessing.resource_tracker)

The main difference between those two calls is that segments created using memfd_create are anonymous and can only be accessed using file descriptors. The name argument in the signature serves only for debugging purposes. On the contrary, shm_open generates segments that map to a file in /dev/shm: therefore, segments each have unique names.

Would we decide to switch from shm_open to memfd_create, the name behavior will also change. How big of a deal would that be?

bf8b85a2-9df0-412b-b6c3-5481df7521c8 commented 5 years ago

Hi @pierreglaser, I recently started using the shared_memory module in multiprocessing , and as you said using memfd_create wouldn't require resource tracking is true. But, I was wondering if these memory segments can't be related/mapped using a unique name, then how will other unrelated processes to which this file descriptor cannot be passed, use this shared memory segment.

Also, would releasing when all the references to the segment are dropped an expected behaviour.

Let's suppose a process creates a shared memory segment and exits. After 5 seconds another process is started, which tries to access the same memory segment. But it won't be able to since all references would have been dropped by the first process, thereby releasing this memory segment.

Feel free to comment, if I misinterpreted anything.

applio commented 5 years ago

Unless I am missing something, memfd_create appears to be specific to the Linux kernel still so we would need to replicate its behavior on all of the other unix systems.

To your point, but quoting from the docs, "separate invocations of memfd_create with the same name will not return descriptors for the same region of memory". If it is possible to use the anonymous shared memory created via memfd_create in another process (which is arguably the primary motivation / use case for multiprocessing.shared_memory), we would need to replicate the unique way of referencing a shared memory segment when trying to attach to it from other processes.

To permit resource management of a shared memory segment (in the sense of ensuring the shared memory segment is always unlinked at the end), the multiprocessing.managers.SharedMemoryManager exists. Because destroying a shared memory segment at exit is not always desirable, the SharedMemoryManager provides additional control over when it is appropriate to unlink a shared memory segment.

tiran commented 5 years ago

You are corret, Davin. The concept of memory fd is a new Linux Kernel feature. It's not available on older Linux Kernels or on non-Linux OS.

To share memory fd segments between processes, the processes have to exchange or inherit the file descriptor somehow. Either the fd can be inherited from a parent process or send over an AF_UNIX socket using SOL_SOCKET / SCM_RIGHTS.