Consistency of Unix's shared_memory implementation with windows

python / cpython

The Python programming language

https://www.python.org

Other

63.22k stars 30.27k forks source link

Consistency of Unix's shared_memory implementation with windows #81935

Open bf8b85a2-9df0-412b-b6c3-5481df7521c8 opened 5 years ago

bf8b85a2-9df0-412b-b6c3-5481df7521c8 commented 5 years ago

BPO	37754
Nosy	@pitrou, @eryksun, @applio, @vinay0410
PRs	python/cpython#15460

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.8', 'type-bug', 'library', '3.9'] title = "Consistency of Unix's shared_memory implementation with windows" updated_at = user = 'https://github.com/vinay0410' ``` bugs.python.org fields: ```python activity = actor = 'davin' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'vinay0410' dependencies = [] files = [] hgrepos = [] issue_num = 37754 keywords = ['patch'] message_count = 12.0 messages = ['348980', '349902', '349987', '350011', '350156', '350161', '350379', '350381', '350529', '351432', '351445', '351962'] nosy_count = 4.0 nosy_names = ['pitrou', 'eryksun', 'davin', 'vinay0410'] pr_nums = ['15460'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue37754' versions = ['Python 3.8', 'Python 3.9'] ```

bf8b85a2-9df0-412b-b6c3-5481df7521c8 commented 5 years ago

Hi, I am opening this to discuss about some possible enhancements in the multiprocessing.shared_memory module.

I have recently started using multiprocessing.shared_memory and realised that currently the module provides no functionality to alter the size of the shared memory segment, plus the process has to know beforehand whether to create a segment or open an existing one, unlike shm_open in C, where segment can be automatically created if it doesn't exist.

For an end user perspective I believe that these functionalities would be really helpful, and I would be happy to contribute, if you believe that they are necessary.

I would also like to mention that I agree this might be by design, or because of some challenges, in which case it would be very helpful if I can know them.

applio commented 5 years ago

Attempts to alter the size of a shared memory segment are met with a variety of different, nuanced behaviors on systems we want to support. I agree that it would be valuable to be able to effectively realloc a shared memory segment, which thankfully the user can do with the current implementation although they become responsible for adjusting for platform-specific behaviors. The design of the API in multiprocessing.shared_memory strives to be as feature-rich as possible while providing consistent behavior across platforms that can be reasonably supported; it also leaves the door open (so to speak) for users to exploit additional platform-specific capabilities of shared memory segments.

Knowing beforehand whether to create a segment or attach to an existing one is an important feature for a variety of use cases. I believe this is discussed at some length in bpo-35813. If what is discussed there does not help (it did get kind of long sometimes), please say so and we can talk through it more.

bf8b85a2-9df0-412b-b6c3-5481df7521c8 commented 5 years ago

Hi Davin, Thanks for replying!

As you said I went through the issue, and now understand why segments should not be automatically created if they don't exist.

But, after reading that thread I got to know that shared memory is supposed to exist even if the process exits, and it can only be freed by unlink, which I also believe is an important functionality required by many use cases as you mentioned.

But, this is not the behaviour currently. As soon as the process exists, all the shared memory created is unlinked.

Also, the documentation currently mentions: "shared memory blocks may outlive the original process that created them", which is not the case at all.

Currently, the resource_tracker, unlinks the shared memory, by calling unlink as specified here:

if os.name == 'posix':
    import _multiprocessing
    import _posixshmem

    _CLEANUP_FUNCS.update({
        'semaphore': _multiprocessing.sem_unlink,
        'shared_memory': _posixshmem.shm_unlink,
    })

So, is this an expected behaviour, if yes documentation should be updated, and if not the code base should be.

I will be happy to submit a patch in both the cases.

PS: I personally believe from my experience that shared memory segments should outlive the process, unless specified otherwise. Also, a argument persist=True, can be added which can ensure that the shared_memory segment outlives the process, and can be used by processes which are spawned later.

eryksun commented 5 years ago

PS: I personally believe from my experience that shared memory segments should outlive the process, unless specified otherwise. Also, a argument persist=True, can be added which can ensure that the shared_memory segment outlives the process, and can be used by processes which are spawned later.

In terms of providing "consistent behavior across platforms that can be reasonably supported", the behavior suggested above could not reasonably be supported in Windows.

The Section (shared memory) object itself is not a file. It gets created in the object namespace, either globally in "\BaseNamedObjects" or in an interactive session's "\Sessions\\<session ID>\BaseNamedObjects".

By default, kernel objects are temporary. Creating permanent named objects requires SeCreatePermanentPrivilege, which by default is only granted to SYSTEM (sort of like root in Unix). For an object to be accessible across sessions and outlive an interactive session, it needs to be global. Creating global Section objects requires SeCreateGlobalPrivilege, which by default is only granted to administrators and service accounts.

Also, the Windows API has no capability to create permanent objects, so this would require the NT API functions NtMakePermanentObject (undocumented, added in NT 5.1) and NtMakeTemporaryObject.

bf8b85a2-9df0-412b-b6c3-5481df7521c8 commented 5 years ago

In terms of providing "consistent behavior across platforms that can be reasonably supported", the behavior suggested above could not reasonably be supported in Windows.

I understand that persistence of a shared memory segment after all the processes using it exit, can be very difficult on Windows.

But, after testing shared_memory on Windows, the behavior on Windows and Unix is not consistent at the moment.

For instance: Let's say a three processes P1, P2 and P3 are trying to communicate using shared memory. --> P1 creates the shared memory block, and waits for P2 and P3 to access it. --> P2 starts and attaches this shared memory segment, writes some data to it and exits. --> Now in case of Unix, shm_unlink is called as soon as P2 exits. --> Now, P3 starts and tries to attach the shared memory segment. --> P3 will not be able to attach the shared memory segment in Unix, because shm_unlink has been called on that segment. --> Whereas, P3 will be able to attach to the shared memory segment in Windows

One possible solution can be, to register the shared the shared_memory only when it's created and not when it's attached.

I think that might make Unix's implementation more consistent with windows.

Any thoughts on the same will be very helpful.

eryksun commented 5 years ago

register the shared the shared_memory only when it's created and not when it's attached.

In Windows, the section object is reference counted. I haven't looked into the Unix implementation, but maybe it could use advisory locking. After opening shared memory via shm_open, a process would try to acquire a shared lock on the fd. If it can't acquire the lock, close the fd and fail the open. When exiting, a process would remove its shared lock and try to acquire an exclusive lock. If it acquires an exclusive lock, it should call shm_unlink because it's the last reference.

bf8b85a2-9df0-412b-b6c3-5481df7521c8 commented 5 years ago

Hi, I just opened a PR implementing a fix very similar to your suggestions. I am using advisory locking using fcntl.flock. And I am locking on file descriptors. If you see my PR, in resource tracker I am opening a file "/dev/shm/\<shm_name>", and trying to acquire exclusive lock on the same. And it's working great on Linux. Since, resource_tracker is spawned as a different process, I can't directly use file descriptors.

But macOS doesn't have any memory mapped files created by shm_open in /dev/shm. In fact, it doesn't store any reference to memory mapped files in the filesystem.

Therefore it get's difficult to get the file descriptor in resource tracker. Also, is there a good way to pass file descriptors between processes.

Any ideas on the above issue will be much appreciated.

bf8b85a2-9df0-412b-b6c3-5481df7521c8 commented 5 years ago

Also, shm_open returns an integer file descriptor. And when this file descriptor is passed too fcntl.flock (in macOS) it throws the following error: OSError: [Errno 45] Operation not supported

Whereas, the same code works fine on linux.

Therefore, I have doubts whether Macos flock implementation support locking shared_memory files.

bf8b85a2-9df0-412b-b6c3-5481df7521c8 commented 5 years ago

Since, advisory locking doesn't work on integer file descriptors which are returned by shm_open on macos, I was thinking of an alternative way of fixing this.

I was thinking of using a shared semaphore, which will store the reference count of the processes using the shared memory segment. resource_tracker will unlink the shared_memory and the shared semaphore, when the count stored by shared semaphore becomes 0. This will ensure that neither the shared memory segment nor the shared semaphore leaks.

Does this sound good ? Any suggestions would be very helpful.

applio commented 5 years ago

A shared semaphore approach for the resource tracker sounds appealing as a way to make the behavior on Windows and posix systems more consistent. However this might get implemented, we should not artificially prevent users from having some option to persist beyond the last Python process's exit.

I like the point that @eryksun makes that we could instead consider using NtMakePermanentObject on Windows to permit more posix-like behavior instead, but I do not think we want to head down a path of using undocumented NT APIs.

In the current code, the resource tracker inappropriately triggers _posixshmem.shm_unlink; we need to fix this in the immediate short term (before 3.8 is released) as it breaks the expected behavior @vinay0410 describes.

bf8b85a2-9df0-412b-b6c3-5481df7521c8 commented 5 years ago

Hi @davin, I researched on lots of approaches to solve this problem, and I have listed down some of the best ones.

As Eryk Sun suggested initially to use advisory locking to implement a reference count type of mechanism. I implemented this in the current Pull Request already open. And it works great on most platforms, but fails on MacOS. This is because MacOS makes no file-system entry like Linux in /dev/shm. In fact it makes no filesystem entry for the created shared memory segment. Therefore this won't work.
I thought of creating a manual file entry for the created shared memory segment in /tmp in MacOS. This will work just fine unless the user manually changes the permissions of /tmp directory on MacOS. And we will have to rely on the fact that /tmp is writable if we use this approach.
Shared Semaphores: This is a very interesting approach to implement reference count, where a semaphore is created corresponding to every shared memory segment, which keeps reference count of the shared memory. Resource Tracker will clear the shared memory segment and the shared semaphore as soon as the value of the shared semaphore becomes 0.

The only problem with this approach is to decide the name of shared semaphore. We will have to define a standardised way to get extract shared semaphore's name from shared segment's name.

For instance, shared_semaphore_name = 'psem' + shared_segment_name. This can cause problems if a semaphore with shared_semaphore_name is already present.

This could be solved by taking any available name and storing it inside shared memory segment upon creation, and then extracting this name to open the shared semaphore.

Another way could be to initialize a semaphore by sem_init() inside the shared memory segment. For example the first 4 bytes can be reserved for semaphore. But, since MacOS has deprecated sem_init(), it wouldn't be a good practice.
Atomic Calls: I think this is the most simple and best way to solve the above problem. We can reserve first 4 bytes for an integer which is nothing but the reference count of shared memory, and to prevent data races we could use atomic calls to update it. gcc has inbuilt support for some atomic operations. I have tested them using processes by updating them inside shared memory. And they work great.

Following are some atomic calls:

type __sync_add_andfetch (type *ptr, type value, ...) type \_sync_sub_andfetch (type *ptr, type value, ...) type \_sync_or_and_fetch (type *ptr, type value, ...)

Documentation of these can be found at below links: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html#g_t_005f_005fsync-Builtins https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html#g_t_005f_005fatomic-Builtins

applio commented 5 years ago

I have created bpo-38119 to track a fix to the inappropriate use of resource tracker with shared memory segments, but this does not replace or supersede what is discussed here.