python / cpython

The Python programming language
https://www.python.org
Other
62.32k stars 29.94k forks source link

ctypes should support atomic operations #75835

Open 09844748-0623-4c37-9350-082719a270a2 opened 6 years ago

09844748-0623-4c37-9350-082719a270a2 commented 6 years ago
BPO 31654
Nosy @rhettinger, @amauryfa, @abalkin, @pitrou, @meadori, @serhiy-storchaka, @applio

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['ctypes', 'type-feature', '3.7'] title = 'ctypes should support atomic operations' updated_at = user = 'https://bugs.python.org/DanielColascione' ``` bugs.python.org fields: ```python activity = actor = 'pitrou' assignee = 'none' closed = False closed_date = None closer = None components = ['ctypes'] creation = creator = 'Daniel Colascione' dependencies = [] files = [] hgrepos = [] issue_num = 31654 keywords = [] message_count = 14.0 messages = ['303442', '303469', '303471', '303473', '303474', '303475', '303476', '303488', '303489', '303491', '303494', '303495', '303496', '303499'] nosy_count = 8.0 nosy_names = ['rhettinger', 'amaury.forgeotdarc', 'belopolsky', 'pitrou', 'meador.inge', 'serhiy.storchaka', 'davin', 'Daniel Colascione'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue31654' versions = ['Python 3.7'] ```

09844748-0623-4c37-9350-082719a270a2 commented 6 years ago

Say we're using multiprocessing to share a counter between two processes and we want to atomically increment that counter. Right now, we need to protect that counter with a multiprocessing semaphore of some sort, then 1) acquire the semaphore, 2) read-modify-write the counter value, and 3) release the semaphore. What if we're preempted by a GIL-acquire request after step #1 but before step #3? We'll hold the semaphore until the OS scheduler gets around to running us again, which might be a while in the case of compute-bound tasks (especially if these tasks call C code that doesn't release the GIL).

Now, if some other process wants to increment the counter, it needs to wait on the first process's GIL! That partially defeats the purpose of multiprocessing: one of the nice things about multiprocessing is avoiding GIL-introduced latency!

If ctypes supported atomic operations, we could skip steps #1 and #3 entirely and operate directly on the shared memory region. Every operating system that supports threads at all also supports some kind of compare-and-exchange primitive. Compare-and-exchange is sufficient for avoiding the GIL contention I describe above.

rhettinger commented 6 years ago

Compare-and-exchange is sufficient for avoiding the GIL contention I describe above.

If Python objects are involved, it is more complicated than you suggest. Possibly, multiprocessing can offer a shared counter that creates integer objects on demand and that offers guaranteed atomic increments and decrements (as semaphores) do.

one of the nice things about multiprocessing is avoiding GIL-introduced latency!

The primary way it achieves this benefit is by avoiding shared state altogether.

pitrou commented 6 years ago

While the use case is reasonable (if specialized), I'm not sure ctypes is the place to expose such functionality, which can be quite extensive (see https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html). Perhaps as a separate package on PyPI?

09844748-0623-4c37-9350-082719a270a2 commented 6 years ago

On Oct 1, 2017 10:19 AM, "Raymond Hettinger" \report@bugs.python.org\ wrote:

Raymond Hettinger \raymond.hettinger@gmail.com\ added the comment:

Compare-and-exchange is sufficient for avoiding the GIL contention I describe above.

If Python objects are involved, it is more complicated than you suggest.

Python objects are not involved. We're talking about memory manipulation on the same level as ctypes.memmove.

Possibly, multiprocessing can offer a shared counter that creates integer objects on demand and that offers guaranteed atomic increments and decrements (as semaphores) do.

Why would it, when ctypes can provide generic functionality?

one of the nice things about multiprocessing is avoiding GIL-introduced latency!

The primary way it achieves this benefit is by avoiding shared state altogether.

Well, yes, but sometimes shared state is unavoidable, and it's best to manipulate it as efficiently as possible.

---------- nosy: +davin, pitrou, rhettinger


Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue31654\


09844748-0623-4c37-9350-082719a270a2 commented 6 years ago

On Oct 1, 2017 10:46 AM, "Antoine Pitrou" \report@bugs.python.org\ wrote:

Antoine Pitrou \pitrou@free.fr\ added the comment:

While the use case is reasonable (if specialized),

It's not that specialized. You might want atomic updates for coordinating with C APIs that expect callers to have this capability.

not sure ctypes is the place to expose such functionality, which can be quite extensive (see https://gcc.gnu.org/onlinedocs/gcc/_005f_ 005fatomic-Builtins.html).

You don't need to provide all of those builtins. Users can build them in Python out of atomic-compare-and-exchange. Only compare and exchange needs C support. It's not very much code.

Perhaps

as a separate package on PyPI?

I have little interest in a separate PyPI module. I don't want to have to distribute custom-compiled extension modules.

----------


Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue31654\


pitrou commented 6 years ago

Le 01/10/2017 à 20:04, Daniel Colascione a écrit :

It's not that specialized. You might want atomic updates for coordinating with C APIs that expect callers to have this capability.

That does sound specialized to me :-) Can you give an example of such a C API?

You don't need to provide all of those builtins. Users can build them in Python out of atomic-compare-and-exchange. Only compare and exchange needs C support. It's not very much code.

I'm assuming you're suggesting to write a loop with an atomic-compare-and-exchange. Bytecode execution in CPython being slow, it means you risk a lot more contention (and busy looping) than if the primitive was written in C. Perhaps even a semaphore would be faster :-)

I have little interest in a separate PyPI module. I don't want to have to distribute custom-compiled extension modules.

Understood, but that's not enough of an argument to put something in the standard library...

You might want to float your idea on python-ideas to see if you get support from people who have a similar need: https://mail.python.org/mailman/listinfo/python-ideas

pitrou commented 6 years ago

Note that if there is already a C API to perform atomic ops, you can simply use ctypes to invoke that API. Unfortunately, the aforementioned GCC builtins seem to be only available as intrinsics (at least I couldn't find a shared library that exposes the __atomic_* functions on my system).

09844748-0623-4c37-9350-082719a270a2 commented 6 years ago

On Sun, Oct 1, 2017 at 11:14 AM, Antoine Pitrou \report@bugs.python.org\ wrote:

Antoine Pitrou \pitrou@free.fr\ added the comment:

Note that if there is already a C API to perform atomic ops, you can simply use ctypes to invoke that API. Unfortunately, the aforementioned GCC builtins seem to be only available as intrinsics (at least I couldn't find a shared library that exposes the __atomic_* functions on my system).

Right. I performed the same search. On Windows, at least InterlockedCompareExchange is exported from kernel32.

09844748-0623-4c37-9350-082719a270a2 commented 6 years ago

On Sun, Oct 1, 2017 at 11:12 AM, Antoine Pitrou \report@bugs.python.org\ wrote:

Antoine Pitrou \pitrou@free.fr\ added the comment:

Le 01/10/2017 à 20:04, Daniel Colascione a écrit : > > It's not that specialized. You might want atomic updates for coordinating > with C APIs that expect callers to have this capability.

That does sound specialized to me :-) Can you give an example of such a C API?

The Linux futex protocol, as described in man futex(7), comes to mind. Maybe you want to manipulate C++ shared_ptr objects --- these objects also rely on atomic operations. For these facilities, you need atomic operations for *correctness*. Taking a mutex as an alternative is not an option because there is no C-side mutex to take.

> You don't need to provide all of those builtins. Users can build them in > Python out of atomic-compare-and-exchange. Only compare and exchange needs > C support. It's not very much code.

I'm assuming you're suggesting to write a loop with an atomic-compare-and-exchange. Bytecode execution in CPython being slow, it means you risk a lot more contention (and busy looping) than if the primitive was written in C. Perhaps even a semaphore would be faster :-)

It's still faster than waiting several milliseconds for the GIL. Bytecode isn't *that* slow --- according to ipython, this operation should take a few hundred nanoseconds. Besides, in a JIT implementation, it'll be as fast as native code.

> I have little interest in a separate PyPI module. I don't want to have to > distribute custom-compiled extension modules.

Understood, but that's not enough of an argument to put something in the standard library...

You might want to float your idea on python-ideas to see if you get support from people who have a similar need: https://mail.python.org/mailman/listinfo/python-ideas

I don't understand the opposition to this feature request. It's a trivial amount of code (invoke a compiler intrinsic), makes the API more complete, and addresses a real, specific use case as well as some other hypothetical use cases. It costs nothing to add this functionality to the standard library. The standard library already includes a whole web server and HTTP client, a diff engine, various database engines, a facility for parsing email, an NNTP client, a GUI system, and a facility for "determin[ing] the type of sound [a] file". Why can the standard library include all of these facilities and not a simple facility for performing a very common kind of memory operation? Standard library support for this functionality is essential, as it's not possible to implement in pure Python.

pitrou commented 6 years ago

Le 01/10/2017 à 21:51, Daniel Colascione a écrit :

> That does sound specialized to me :-) Can you give an example of such a > C API?

The Linux futex protocol, as described in man futex(7), comes to mind. Maybe you want to manipulate C++ shared_ptr objects --- these objects also rely on atomic operations.

That's even more specialized than I expected...

It's still faster than waiting several milliseconds for the GIL.

Are you talking about https://bugs.python.org/issue31653? If so, it's just waiting for an appropriate PR to be filed.

I don't understand the opposition to this feature request. It's a trivial amount of code (invoke a compiler intrinsic), makes the API more complete, and addresses a real, specific use case as well as some other hypothetical use cases.

That's a compiler-dependent compiler intrinsic (or perhaps a whole range of them, given there are different widths to cater for), an API wrapping it, plus some documentation and tests, that we have to maintain until the end of time (at least nominally).

The standard library already includes a whole web server and HTTP client, a diff engine, various database engines, a facility for parsing email, an NNTP client, a GUI system, and a facility for "determin[ing] the type of sound [a] file".

It was determined at the time that the use cases for these justified the effort of maintaining them in the stdlib. For a couple of these (such as "determining the type of a sound file" or even an NNTP client), I expect the decision would be different nowadays :-)

Perhaps other core developers will disagree with me and agree to include (i.e. review, maintain) this functionality. I simply am not convinced it deserves being included, but that's not a veto.

serhiy-storchaka commented 6 years ago

It is not clear to me what API is needed, but I agree with Antoine that ctypes doesn't look the appropriate place for it. Maybe in multiprocessing or subprocess, or in low-level module providing primitives for multiprocessing or subprocess?

09844748-0623-4c37-9350-082719a270a2 commented 6 years ago

On Sun, Oct 1, 2017 at 2:01 PM, Antoine Pitrou \report@bugs.python.org\ wrote:

Antoine Pitrou \pitrou@free.fr\ added the comment:

Le 01/10/2017 à 21:51, Daniel Colascione a écrit : > >> That does sound specialized to me :-) Can you give an example of such a >> C API? > > The Linux futex protocol, as described in man futex(7), comes to mind. > Maybe you want to manipulate C++ shared_ptr objects --- these objects also > rely on atomic operations.

That's even more specialized than I expected...

Huh? Both are very generic.

> It's still faster than waiting several milliseconds for the GIL.

Are you talking about https://bugs.python.org/issue31653? If so, it's just waiting for an appropriate PR to be filed.

This is a separate issue. That's about thrashing around less when we take a lock. This issue is about process A not having to wait on process B to schedule a thread in order to perform a simple operation on memory that both processes own.

> I don't understand the opposition to this feature request. It's a trivial > amount of code (invoke a compiler intrinsic), makes the API more complete, > and addresses a real, specific use case as well as some other hypothetical > use cases.

That's a compiler-dependent compiler intrinsic (or perhaps a whole range of them, given there are different widths to cater for), an API wrapping it, plus some documentation and tests, that we have to maintain until the end of time (at least nominally).

It's trivial and easy to support conditionally. SCM_RIGHTS is "specialized" and not supported on all systems, yet it's in stdlib.

> The standard library already includes a whole web server and HTTP > client, a diff engine, various database engines, a facility for parsing > email, an NNTP client, a GUI system, and a facility for "determin[ing] the > type of sound [a] file".

It was determined at the time that the use cases for these justified the effort of maintaining them in the stdlib. For a couple of these (such as "determining the type of a sound file" or even an NNTP client), I expect the decision would be different nowadays :-)

Perhaps other core developers will disagree with me and agree to include (i.e. review, maintain) this functionality. I simply am not convinced it deserves being included, but that's not a veto.

----------


Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue31654\


09844748-0623-4c37-9350-082719a270a2 commented 6 years ago

On Sun, Oct 1, 2017 at 2:01 PM, Antoine Pitrou \report@bugs.python.org\ wrote:

Perhaps other core developers will disagree with me and agree to include (i.e. review, maintain) this functionality. I simply am not convinced it deserves being included, but that's not a veto.

ctypes is a library for operating on native memory and working with native functions. Performing atomic operations on memory is definitely within its scope. Why does ctypes include memmove? Why memmove and not compare-and-exchange? What evidence, if any, would convince you?

pitrou commented 6 years ago

Le 01/10/2017 à 23:33, Daniel Colascione a écrit :

Huh? Both are very generic.

"Specialized" as in "I didn't expect anyone would want to do such a thing in pure Python".

SCM_RIGHTS is "specialized" and not supported on all systems, yet it's in stdlib.

Because passing fds between processes was considered useful enough (it's actually used by multiprocessing itself, for example to implement the forkserver model).

And regardless, trying to point to other (more or less exotic) features of the stdlib is not a convincing argument to add a new feature.