Closed vstinner closed 3 years ago
I propose to add at least Py_Is(x, y) function to the Python C API which would be simply implemented as:
static inline int Py_Is(PyObject *x, PyObject *y) { return (x == y); }
Right now, there is no benefit for CPython. The idea is to prepare the CPython code base for future optimization, like tagged pointers. It may help PyPy. It can also help to prepare C extensions to migrate to HPy, since HPy requires to use HPy_Is(ctx, x, y): "x == y" (where x and y types is HPy) fails with a compiler error with HPy.
--
CPython uses reference counting and singletons like None and False. The "x is y" operator in Python is implemented with the IS_OP opcode implemented in Python/ceval.c as:
case TARGET(IS_OP): {
PyObject *right = POP();
PyObject *left = TOP();
int res = (left == right)^oparg;
PyObject *b = res ? Py_True : Py_False;
Py_INCREF(b);
SET_TOP(b);
Py_DECREF(left);
Py_DECREF(right);
PREDICT(POP_JUMP_IF_FALSE);
PREDICT(POP_JUMP_IF_TRUE);
DISPATCH();
}
In short, "x is y" in Python simply compares directly PyObject pointer values in C: "x == y" where x and y are PyObject pointers.
PyPy doesn't use reference counting and so implements "x is y" differently: id(x) == id(y) if I understood correctly. In PyPy, id(obj) is more complex than simply getting the object memory address. For example, id(1) is always 17 in PyPy (on Linux/x86-64 at least).
At the C API level, using "x == y" to check if y object "is" x object doesn't work well with PyPy object models. Moreover, "x == Py_None" doesn't work if None is stored as a tagged pointer in CPython.
--
Maybe we can also add functions to check if an object is a singleton:
See also bpo-39511 "[subinterpreters] Per-interpreter singletons (None, True, False, etc.)" which may need to replace Py_None singleton with Py_GetNone(). IMO it makes more sense to call Py_IS_NONE(x) function to express that code is run, rather than writing "x == Py_None" as if Py_None is and will always be a variable directly accesssible.
Py_IS_TRUE() name sounds like PyObject_IsTrue(). Can it be an issue? Can someone come with a better name?
Python has more singletons. In C:
But I don't think that "x == NotImplemented" is common enough to justify a new function.
Py_IS_TYPE(obj, type) was added to Python 3.9 by bpo-39573: https://docs.python.org/dev/c-api/structures.html#c.Py_IS_TYPE
commit d905df766c367c350f20c46ccd99d4da19ed57d8 Author: Dong-hee Na \donghee.na92@gmail.com\ Date: Fri Feb 14 02:37:17 2020 +0900
bpo-39573: Add Py_IS_TYPE() function (GH-18488)
Co-Author: Neil Schemenauer <nas-github@arctrix.com>
It's currently implemented as:
static inline int _Py_IS_TYPE(const PyObject *ob, const PyTypeObject *type) {
return Py_TYPE(ob) == type;
}
#define Py_IS_TYPE(ob, type) _Py_IS_TYPE(_PyObject_CAST_CONST(ob), type)
Right now, there is no benefit for CPython.
Please don't this until we have a clear demonstrable benefit. As it stands now, this is all cost and no benefit.
Adding unnecessary abstraction layers just makes it more difficult for people to learn to be core devs. Also, it will likely result in a lot of pointless code churn where each change carries a risk of a new bug being introduced.
The notion of pointer comparison is so fundamental to C code that it is counter productive to try to abstract it away. It isn't much different than saying that every instance of "a + b" should be abstracted to "add(a, b)" and every "a[b]" should be abstracted to "index_lookup(a, b)" on the hope that maybe it might someday be helpful for PyPy or HPy even though they have never requested such a change.
From my own point of view, these need abstractions just make it harder to tell what code is actually doing. Further, it will just lead to wasting everyone's time in code reviews where the reviewer insists on the applying the new inline function and there is confusion about whether two pointers are generic pointers or python object pointers, each with their own comparison technique.
Also, there is too much faith in functions marked as "inline" always being inlined. Compilers get to make their own choices and under some circumstances will not inline, especially for cross module calls. This risks taking code that is currently obviously fast and occasionally, invisibility slowing it down massively — from a step that is 1 cycle at most and is sometimes zero cost to a step that involves an actual function call, possibly needing to save and restore registers.
Lastly, the API changes aren't just for you or the standard library. In effect, you're telling the entire ecosystem of C extensions that they are doing it wrong. Almost certainly, some will follow this path and some won't, further fracturing the ecosystem.
For any sane design of tagged pointers, x == y
(in C) will work fine.
is
is not well defined except for a small set of values, so the docs for Py_Is
would have to so vague as to be worthless, IMO.
For any sane design of tagged pointers,
x == y
(in C) will work fine.
I wrote a proof-of-concept for using tagged pointners in CPython: https://github.com/vstinner/cpython/pull/6
"The PR is large because of the first changes which add Py_IS_NONE(), Py_IS_TRUE() and Py_IS_FALSE() macros."
For backward compatibility, I added a function to get a concrete Python object from a tagged pointer: the True as a tagged pointer becomes (PyObject *)&_Py_TrueStruct concrete object. Py_IS_TRUE() has to check for both values: tagged pointer or &_Py_TrueStruct (don't look at my exact implementation, it's wrong ;-)).
In a perfect world, there would be no backward compatibility and you would never have to create Python objects from a tagged pointer.
The other problem is that the stable ABI exposes "Py_True" as "&_Py_TrueStruct" and so C extensions built with the stable ABI uses "&_Py_TrueStruct", not the tagged pointer.
See also bpo-39511 which discuss solutions for these problems.
Also, there is too much faith in functions marked as "inline" always being inlined.
I proposed to declare it as a "static inline" function, but I'm fine with a macro as well. I mostly care about the API: call "Py_Is()", not really about the exact implementation. In practice, for now, Py_Is(x, y) will continue to be compiled as "x == y".
Just chiming in to say that for PyPy this API would be extremely useful, because PyPy's "is" is not implementable with a pointer comparison on the C level (due to unboxing we need to compare integers, floats, etc by value). Right now, C extension code that compares pointers is subtly broken and cannot be fixed by us.
Just chiming in to say that for PyPy this API would be extremely useful
Thanks for that input. Given that there would be some value add, I withdraw my objection.
I proposed to declare it as a "static inline" function, but I'm fine with a macro as well.
Let's use a macro then because inlining in guaranteed and we don't have to be on the lookout for failures to inline.
PR 25227: I reimplemented Py_Is() as a macro and I added unit tests.
Other added functions simply call Py_Is(), example:
#define Py_IsNone(x) Py_Is(x, Py_None)
Mark:
is
is not well defined except for a small set of values, so the docs forPy_Is
would have to so vague as to be worthless, IMO.
I don't propose to change the "is" operator semantic: Py_Is(x, y) is C would behave *exaclty* as "x is y" in Python.
Check my implementation: the IS_OP bytecode uses directly Py_Is().
I'd also prefer a Py_IsNotNone() because it's more explicit than !Py_IsNone(); the exclamation mark can be easily missed when reading/writing code, IMO.
Just my 2 cents.
I'd also prefer a Py_IsNotNone() because it's more explicit than !Py_IsNone()
I would prefer keep the C API small. I don't think that we need to duplicate all functions testing for something. We provide PyTuple_Check(obj) but we don't provide PyTuple_NotCheck(obj) for example.
IMO !Py_IsNone(obj) makes perfectly sense in Python.
Also, "x == Py_None" is more common than "x != Py_None".
I would prefer keep the C API small.
Yes, I see the value of that as well.
I tried applying this API on an extension, and I found the code to be slightly less readable for the "is not" cases.
I tried applying this API on an extension, and I found the code to be slightly less readable for the "is not" cases.
FYI you can try upgrade_pythoncapi.py on your project using the following PR to update code to use Py_IsNone/Py_IsTrue/Py_IsFalse: https://github.com/pythoncapi/pythoncapi_compat/pull/8
For example, use "upgrade_pythoncapi.py -o Py_Is directory/" or "upgrade_pythoncapi.py -o Py_Is file.c".
New changeset 09bbebea163fe7303264cf4069c51d4d2f22fde4 by Victor Stinner in branch 'master': bpo-43753: Add Py_Is() and Py_IsNone() functions (GH-25227) https://github.com/python/cpython/commit/09bbebea163fe7303264cf4069c51d4d2f22fde4
Carl:
Just chiming in to say that for PyPy this API would be extremely useful, because PyPy's "is" is not implementable with a pointer comparison on the C level (due to unboxing we need to compare integers, floats, etc by value). Right now, C extension code that compares pointers is subtly broken and cannot be fixed by us.
Can you come with a sentence that I can put in the documentation to explain that Py_Is() is written for interoperability with other Python implementations?
Well, nobody came up with a better definition, so let's go with:
"Test if the x object is the y object, the same as x is y in Python."
Thanks for the feedback and reviews! Py_Is(x, y), Py_IsNone(x), Py_IsTrue(x) and Py_IsFalse(x) are now part of Python 3.10 C API.
New changeset 8d3e7eff0936926554db6162c992af5829dc8160 by Victor Stinner in branch 'main': bpo-43753: _operator.is_() uses Py_Is() (GH-28641) https://github.com/python/cpython/commit/8d3e7eff0936926554db6162c992af5829dc8160
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['expert-C-API', '3.10']
title = '[C API] Add Py_Is(x, y) and Py_IsNone(x) functions'
updated_at =
user = 'https://github.com/vstinner'
```
bugs.python.org fields:
```python
activity =
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date =
closer = 'vstinner'
components = ['C API']
creation =
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 43753
keywords = ['patch']
message_count = 19.0
messages = ['390358', '390360', '390364', '390379', '390408', '390411', '390412', '390435', '390456', '390469', '390526', '390613', '390614', '390616', '390628', '390750', '390751', '393907', '402922']
nosy_count = 9.0
nosy_names = ['tim.peters', 'rhettinger', 'mark.dickinson', 'vstinner', 'Carl.Friedrich.Bolz', 'Mark.Shannon', 'pablogsal', 'shihai1991', 'erlendaasland']
pr_nums = ['25227', '28641']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue43753'
versions = ['Python 3.10']
```