python / cpython

The Python programming language
https://www.python.org
Other
63.37k stars 30.33k forks source link

[C API] Add Py_Is(x, y) and Py_IsNone(x) functions #87919

Closed vstinner closed 3 years ago

vstinner commented 3 years ago
BPO 43753
Nosy @tim-one, @rhettinger, @mdickinson, @vstinner, @cfbolz, @markshannon, @pablogsal, @shihai1991, @erlend-aasland
PRs
  • python/cpython#25227
  • python/cpython#28641
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['expert-C-API', '3.10'] title = '[C API] Add Py_Is(x, y) and Py_IsNone(x) functions' updated_at = user = 'https://github.com/vstinner' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'none' closed = True closed_date = closer = 'vstinner' components = ['C API'] creation = creator = 'vstinner' dependencies = [] files = [] hgrepos = [] issue_num = 43753 keywords = ['patch'] message_count = 19.0 messages = ['390358', '390360', '390364', '390379', '390408', '390411', '390412', '390435', '390456', '390469', '390526', '390613', '390614', '390616', '390628', '390750', '390751', '393907', '402922'] nosy_count = 9.0 nosy_names = ['tim.peters', 'rhettinger', 'mark.dickinson', 'vstinner', 'Carl.Friedrich.Bolz', 'Mark.Shannon', 'pablogsal', 'shihai1991', 'erlendaasland'] pr_nums = ['25227', '28641'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue43753' versions = ['Python 3.10'] ```

    vstinner commented 3 years ago

    I propose to add at least Py_Is(x, y) function to the Python C API which would be simply implemented as:

        static inline int Py_Is(PyObject *x, PyObject *y) { return (x == y); }

    Right now, there is no benefit for CPython. The idea is to prepare the CPython code base for future optimization, like tagged pointers. It may help PyPy. It can also help to prepare C extensions to migrate to HPy, since HPy requires to use HPy_Is(ctx, x, y): "x == y" (where x and y types is HPy) fails with a compiler error with HPy.

    --

    CPython uses reference counting and singletons like None and False. The "x is y" operator in Python is implemented with the IS_OP opcode implemented in Python/ceval.c as:

            case TARGET(IS_OP): {
                PyObject *right = POP();
                PyObject *left = TOP();
                int res = (left == right)^oparg;
                PyObject *b = res ? Py_True : Py_False;
                Py_INCREF(b);
                SET_TOP(b);
                Py_DECREF(left);
                Py_DECREF(right);
                PREDICT(POP_JUMP_IF_FALSE);
                PREDICT(POP_JUMP_IF_TRUE);
                DISPATCH();
            }

    In short, "x is y" in Python simply compares directly PyObject pointer values in C: "x == y" where x and y are PyObject pointers.

    PyPy doesn't use reference counting and so implements "x is y" differently: id(x) == id(y) if I understood correctly. In PyPy, id(obj) is more complex than simply getting the object memory address. For example, id(1) is always 17 in PyPy (on Linux/x86-64 at least).

    At the C API level, using "x == y" to check if y object "is" x object doesn't work well with PyPy object models. Moreover, "x == Py_None" doesn't work if None is stored as a tagged pointer in CPython.

    --

    Maybe we can also add functions to check if an object is a singleton:

    See also bpo-39511 "[subinterpreters] Per-interpreter singletons (None, True, False, etc.)" which may need to replace Py_None singleton with Py_GetNone(). IMO it makes more sense to call Py_IS_NONE(x) function to express that code is run, rather than writing "x == Py_None" as if Py_None is and will always be a variable directly accesssible.

    Py_IS_TRUE() name sounds like PyObject_IsTrue(). Can it be an issue? Can someone come with a better name?

    vstinner commented 3 years ago

    Python has more singletons. In C:

    But I don't think that "x == NotImplemented" is common enough to justify a new function.

    vstinner commented 3 years ago

    Py_IS_TYPE(obj, type) was added to Python 3.9 by bpo-39573: https://docs.python.org/dev/c-api/structures.html#c.Py_IS_TYPE

    commit d905df766c367c350f20c46ccd99d4da19ed57d8 Author: Dong-hee Na \donghee.na92@gmail.com\ Date: Fri Feb 14 02:37:17 2020 +0900

    bpo-39573: Add Py_IS_TYPE() function (GH-18488)
    
    Co-Author: Neil Schemenauer <nas-github@arctrix.com>

    It's currently implemented as:

    static inline int _Py_IS_TYPE(const PyObject *ob, const PyTypeObject *type) {
        return Py_TYPE(ob) == type;
    }
    #define Py_IS_TYPE(ob, type) _Py_IS_TYPE(_PyObject_CAST_CONST(ob), type)
    rhettinger commented 3 years ago

    Right now, there is no benefit for CPython.

    Please don't this until we have a clear demonstrable benefit. As it stands now, this is all cost and no benefit.

    Adding unnecessary abstraction layers just makes it more difficult for people to learn to be core devs. Also, it will likely result in a lot of pointless code churn where each change carries a risk of a new bug being introduced.

    The notion of pointer comparison is so fundamental to C code that it is counter productive to try to abstract it away. It isn't much different than saying that every instance of "a + b" should be abstracted to "add(a, b)" and every "a[b]" should be abstracted to "index_lookup(a, b)" on the hope that maybe it might someday be helpful for PyPy or HPy even though they have never requested such a change.

    From my own point of view, these need abstractions just make it harder to tell what code is actually doing. Further, it will just lead to wasting everyone's time in code reviews where the reviewer insists on the applying the new inline function and there is confusion about whether two pointers are generic pointers or python object pointers, each with their own comparison technique.

    Also, there is too much faith in functions marked as "inline" always being inlined. Compilers get to make their own choices and under some circumstances will not inline, especially for cross module calls. This risks taking code that is currently obviously fast and occasionally, invisibility slowing it down massively — from a step that is 1 cycle at most and is sometimes zero cost to a step that involves an actual function call, possibly needing to save and restore registers.

    Lastly, the API changes aren't just for you or the standard library. In effect, you're telling the entire ecosystem of C extensions that they are doing it wrong. Almost certainly, some will follow this path and some won't, further fracturing the ecosystem.

    markshannon commented 3 years ago

    For any sane design of tagged pointers, x == y (in C) will work fine.

    is is not well defined except for a small set of values, so the docs for Py_Is would have to so vague as to be worthless, IMO.

    vstinner commented 3 years ago

    For any sane design of tagged pointers, x == y (in C) will work fine.

    I wrote a proof-of-concept for using tagged pointners in CPython: https://github.com/vstinner/cpython/pull/6

    "The PR is large because of the first changes which add Py_IS_NONE(), Py_IS_TRUE() and Py_IS_FALSE() macros."

    For backward compatibility, I added a function to get a concrete Python object from a tagged pointer: the True as a tagged pointer becomes (PyObject *)&_Py_TrueStruct concrete object. Py_IS_TRUE() has to check for both values: tagged pointer or &_Py_TrueStruct (don't look at my exact implementation, it's wrong ;-)).

    In a perfect world, there would be no backward compatibility and you would never have to create Python objects from a tagged pointer.

    The other problem is that the stable ABI exposes "Py_True" as "&_Py_TrueStruct" and so C extensions built with the stable ABI uses "&_Py_TrueStruct", not the tagged pointer.

    See also bpo-39511 which discuss solutions for these problems.

    vstinner commented 3 years ago

    Also, there is too much faith in functions marked as "inline" always being inlined.

    I proposed to declare it as a "static inline" function, but I'm fine with a macro as well. I mostly care about the API: call "Py_Is()", not really about the exact implementation. In practice, for now, Py_Is(x, y) will continue to be compiled as "x == y".

    9719010a-7a6d-45ea-a4c0-9c1ede24d0ea commented 3 years ago

    Just chiming in to say that for PyPy this API would be extremely useful, because PyPy's "is" is not implementable with a pointer comparison on the C level (due to unboxing we need to compare integers, floats, etc by value). Right now, C extension code that compares pointers is subtly broken and cannot be fixed by us.

    rhettinger commented 3 years ago

    Just chiming in to say that for PyPy this API would be extremely useful

    Thanks for that input. Given that there would be some value add, I withdraw my objection.

    I proposed to declare it as a "static inline" function, but I'm fine with a macro as well.

    Let's use a macro then because inlining in guaranteed and we don't have to be on the lookout for failures to inline.

    vstinner commented 3 years ago

    PR 25227: I reimplemented Py_Is() as a macro and I added unit tests.

    Other added functions simply call Py_Is(), example:

    #define Py_IsNone(x) Py_Is(x, Py_None)
    vstinner commented 3 years ago

    Mark:

    is is not well defined except for a small set of values, so the docs for Py_Is would have to so vague as to be worthless, IMO.

    I don't propose to change the "is" operator semantic: Py_Is(x, y) is C would behave *exaclty* as "x is y" in Python.

    Check my implementation: the IS_OP bytecode uses directly Py_Is().

    erlend-aasland commented 3 years ago

    I'd also prefer a Py_IsNotNone() because it's more explicit than !Py_IsNone(); the exclamation mark can be easily missed when reading/writing code, IMO.

    Just my 2 cents.

    vstinner commented 3 years ago

    I'd also prefer a Py_IsNotNone() because it's more explicit than !Py_IsNone()

    I would prefer keep the C API small. I don't think that we need to duplicate all functions testing for something. We provide PyTuple_Check(obj) but we don't provide PyTuple_NotCheck(obj) for example.

    IMO !Py_IsNone(obj) makes perfectly sense in Python.

    Also, "x == Py_None" is more common than "x != Py_None".

    erlend-aasland commented 3 years ago

    I would prefer keep the C API small.

    Yes, I see the value of that as well.

    I tried applying this API on an extension, and I found the code to be slightly less readable for the "is not" cases.

    vstinner commented 3 years ago

    I tried applying this API on an extension, and I found the code to be slightly less readable for the "is not" cases.

    FYI you can try upgrade_pythoncapi.py on your project using the following PR to update code to use Py_IsNone/Py_IsTrue/Py_IsFalse: https://github.com/pythoncapi/pythoncapi_compat/pull/8

    For example, use "upgrade_pythoncapi.py -o Py_Is directory/" or "upgrade_pythoncapi.py -o Py_Is file.c".

    vstinner commented 3 years ago

    New changeset 09bbebea163fe7303264cf4069c51d4d2f22fde4 by Victor Stinner in branch 'master': bpo-43753: Add Py_Is() and Py_IsNone() functions (GH-25227) https://github.com/python/cpython/commit/09bbebea163fe7303264cf4069c51d4d2f22fde4

    vstinner commented 3 years ago

    Carl:

    Just chiming in to say that for PyPy this API would be extremely useful, because PyPy's "is" is not implementable with a pointer comparison on the C level (due to unboxing we need to compare integers, floats, etc by value). Right now, C extension code that compares pointers is subtly broken and cannot be fixed by us.

    Can you come with a sentence that I can put in the documentation to explain that Py_Is() is written for interoperability with other Python implementations?

    vstinner commented 3 years ago

    Well, nobody came up with a better definition, so let's go with:

    "Test if the x object is the y object, the same as x is y in Python."

    Thanks for the feedback and reviews! Py_Is(x, y), Py_IsNone(x), Py_IsTrue(x) and Py_IsFalse(x) are now part of Python 3.10 C API.

    vstinner commented 3 years ago

    New changeset 8d3e7eff0936926554db6162c992af5829dc8160 by Victor Stinner in branch 'main': bpo-43753: _operator.is_() uses Py_Is() (GH-28641) https://github.com/python/cpython/commit/8d3e7eff0936926554db6162c992af5829dc8160