python / cpython

The Python programming language
https://www.python.org
Other
63.5k stars 30.42k forks source link

Pythread_ident_t is not used consistently and could be more robust. #123983

Open culler opened 2 months ago

culler commented 2 months ago

Bug report

Bug description:

Currently, pycore_pythread.h contains:

typedef unsigned long long PyThread_ident_t;

and pycore_lock.h contains:

typedef struct {
     PyMutex mutex;
     unsigned long long thread;  // i.e., PyThread_get_thread_ident_ex()
     size_t level;
 } _PyRecursiveMutex;

This is clearly wrong. If a typedef exists then it should be used.

But the type itself has drawbacks which make it less robust than it could be. While the problem is not internal to python, and results from misuse, nonetheless python could guard against the misuse.

The context in which I encountered a problem was the cypari project, which provides a Cython wrapper for the PARI number theory library. The misuse has two parts: (1) Cython includes internal Python headers (pycore_lock.h) (2) The PARI library deals with the issue that 64 bit Windows does not have 64 bit longs by using the following (amazing) hack: #define long long long Including pycore_lock.h after parigen.h, as Cython does, causes a compiler error.

A more robust type, not vulnerable to these sorts of shenanigans, for PyThread_ident_t would be size_t. A modest amount of casting is required to implement this, since _Py_atomic_load_ullong_relaxed has to be replaced by _Py_atomic_load_ssize_relaxed.

CPython versions tested on:

3.13

Operating systems tested on:

macOS

Linked PRs

culler commented 2 months ago

I now realize that size_t was a bad choice, since the PyThread_ident_t is assumed to be 64 bits on all platforms. Using uint64_t should to work better and require fewer casts.

colesbury commented 2 months ago

I don't think we should be changing the definition of PyThread_ident_t.

And I don't think Cython should be including the internal-only pycore_lock.h header.

colesbury commented 2 months ago

Filed an issue in Cython:

culler commented 2 months ago

At this point I agree with you. There was no good choice for an alternative that would work around the crazy #define long long long directive (For example, MSVC apparently implements its uint64_t type with a macro that replaces uint64_t with unsigned long long, so the problem persists.). I was able to find a different (unpleasant) workaround.

Still, I do think that as long as the PyThread_ident_t typedef exists, it should be used. Using it is some places and using unsigned long long in other places where a PyThread_ident_t is expected makes no sense to me.

colesbury commented 2 months ago

The locking code is concerned with atomic semantics:

That the thread field uses PyThread_get_thread_ident_ex() is a detail internal to lock.c that's not exposed in pycore_lock.h.

culler commented 2 months ago

Maybe the comment // i.e., PyThread_get_thread_ident_ex() could at least mention PyThread_ident_t?

colesbury commented 2 months ago

I don't think that's worthwhile, but if you are inclined to send a PR to Cython to remove the pycore_lock.h include, that would be appreciated and fix your immediate issue.