Should we pass the interpreter as an explicit parameter?

markshannon commented 2 years ago

There are pros and cons to passing the interpreter as an explicit parameter.

Correctness and usability

Pros

No need to use thread-local storage, which might be slow.
Decouples the interpreter from the hardware thread.

Cons

Introduces a possible failure mode, where the passed in interpreter is not the required interpreter
Requires extensions to pass an extra parameter down the call stack, bulking out the code

Performance

Platform has fast thread-local storage

Passing the extra parameter is going to be slower, as it needs to be saved across calls, and increases register pressure. If the platform has access to fast thread-local storage, then fetching from thread-local storage is likely to be faster.

Platform doesn't have thread-local storage

In this case, passing the extra parameter is going to be faster.

In my opinion the safety issue of passing the wrong interpreter is the most important. Mixing objects from different interpreters is likely to cause strange bugs, that are going to be horrible to debug.

encukou commented 2 years ago

Wasm and HPy people might have relevant input here.

IMO the safety issue is a bit misleading: either way this goes, you can pass in a PyRef belonging to a wrong interpreter.

ericsnowcurrently commented 2 years ago

FWIW, I favor passing the runtime context explicitly.

it makes it clear which API relies on which context (or on no context)
I also think it would make the C-API a little easier to use, especially when embedding (see Lua)--enough to be worth it
without the explicit parameter, it's a little harder to move away from storing any state in globals and easier to fall back to using globals

That said, there are enough downsides that it has seemed hard to justify that switch in the past. The biggest negative to me was the potential for disruption to users. Shim headers can help with that, but then that's extra complexity we have to maintain.

encukou commented 2 years ago

it makes it clear which API relies on which context (or on no context)

IMO, that is exactly the kind of implementation detail that should not be exposed in long-term-stable API.

it would make the C-API a little easier to use, especially when embedding (see Lua)

That's my hunch. More regularity/less hidden state is good. I'd like to see it backed up by good arguments :)

The biggest negative to me was the potential for disruption to users.

+1. But that issue kinda goes away with a completely new API.

timfel commented 2 years ago

The biggest negative to me was the potential for disruption to users.

+1. But that issue kinda goes away with a completely new API.

From our experience with HPy, it turns out that adding the argument is actually the simplest thing when you are porting to a completely new API anyway. We tried to keep API function names very similar between HPy and the CPython API, so just adding the context would often just be a relatively straightforward re.sub(r'(Py[^(]+)\(', r'H\1(ctx,', code). Migrating reference counting, heap types etc is often so much more work that the additional argument didn't register as effort to me.

IMO the safety issue is a bit misleading: either way this goes, you can pass in a PyRef belonging to a wrong interpreter.

Indeed. If we consider things like the NumPy API where e.g. Matplotlib directly calls NumPy functions that again may do API calls, they are passing not only the context, but also object handles, so those simply have to match.

markshannon commented 2 years ago

If we do pass the interpreter around, we need to be extremely careful to differentiate between cases where it must be passed through untouched, or it is a legitimate parameter. For example, suppose we have

PyThreadRef PyApi_Interpreter_GetCurrentThread(PyInterpreterRef ref)

Does this get the current thread of the current interpreter (and ref must be the current interpreter), or can you pass in any interpreter and get that interpreter's current thread?

The "obvious" semantics is the latter, which might not be what we want.

OOI, does HPy have any functions for querying a HpyContext? If so, can you pass in any HpyContext or just the correct one?

ericsnowcurrently commented 2 years ago

FWIW, here's a related discussion I started in 2019: https://mail.python.org/archives/list/capi-sig@python.org/thread/RBLU35OUT2KDFCABK32VNOH4UKSKEUWW/

ericsnowcurrently commented 2 years ago

If we do pass the interpreter around, we need to be extremely careful to differentiate between cases where it must be passed through untouched, or it is a legitimate parameter.

+1

I expect we'd have a distinct context type. Then API which explicitly targets a specific runtime state (interpreter/thread/global) would explicitly take an argument of the appropriate type (which would be opaque in the API). An API to get the thread/interpreter/runtime state from the context object would certainly be valuable then.

PyThreadRef PyApi_Interpreter_GetCurrentThread(PyInterpreterRef ref)
Does this get the current thread of the current interpreter (and ref must be the current interpreter), or can you pass in any interpreter and get that interpreter's current thread?

Hmm, wouldn't the idea of a "current" thread/interpreter be essential to the context object (and nothing else)? Also, the concept of a "current" thread, relative to an interpreter, isn't a necessary one (even though that relationship exists currently). At best it is an internal detail that we wouldn't want to (nor need to) leak out in the API. Are there optimization reasons to preserve a public relationship between an interpreter and its "current" thread?

Regardless of all that, your general point holds. There needs to be a clear distinction in the API signature for operating-in-the-current-context vs. operating-on/with-some-runtime-state. I suppose the big question is, what are the use cases for interacting with a thread/interpreter/global runtime state directly, as opposed to a context object?

The "obvious" semantics is the latter, which might not be what we want.

Assuming such an API still made sense, I agree that the interpretation of that function signature is obvious and do think it's what we would want.

markshannon commented 2 years ago

PyApi_Interpreter_GetCurrentThread was a poor choice of example, as this has nothing to do with threads. Let's use PyApi_Interpreter_GetBuiltins() as an example instead.

I guess we just make the "context" completely opaque, and if it passed anywhere, it must be passed everywhere (and checked in debug mode), so that

PyApi_Interpreter_GetBuiltins(PyInterpreterRef ref)

becomes either

PyApi_GetBuiltins(PyContext ctx); // Gets the builtins of the current interpreter

or

PyApi_Interpreter_GetBuiltins(PyContext ctx, PyInterpreterRef iref) // Gets the builtins of iref

removing the ambiguity.

what are the use cases for interacting with a thread/interpreter/global runtime state directly, as opposed to a context object?

Maybe none, but I don't want the ambiguity. So it is important that there are no functions to query or modify PyContext.

markshannon commented 2 years ago

I have been convinced (by @antocuni) that we do want to pass the interpreter around.

The reason for this is that do not want to provide access to the full context to some extension provided functions. E.g. finalizers should have access to the full context, but de-allocation functions should only have access to context sufficient to free memory.

If the context is implicit, then it cannot differ for different parts of the API.

markshannon / New-C-API-for-Python