python / cpython

The Python programming language
https://www.python.org
Other
62.89k stars 30.13k forks source link

[C API] Avoid accessing PyObject and PyVarObject members directly: add Py_SET_TYPE() and Py_IS_TYPE(), disallow Py_TYPE(obj)=type #83754

Closed vstinner closed 3 years ago

vstinner commented 4 years ago
BPO 39573
Nosy @vstinner
PRs
  • python/cpython#18388
  • python/cpython#18389
  • python/cpython#18390
  • python/cpython#18391
  • python/cpython#18392
  • python/cpython#18393
  • python/cpython#18394
  • python/cpython#18398
  • python/cpython#18400
  • python/cpython#18402
  • python/cpython#18411
  • python/cpython#18419
  • python/cpython#18488
  • python/cpython#18496
  • python/cpython#18507
  • python/cpython#18508
  • python/cpython#18521
  • python/cpython#18601
  • python/cpython#18789
  • python/cpython#18798
  • python/cpython#18799
  • python/cpython#18804
  • python/cpython#18809
  • python/cpython#19882
  • python/cpython#19975
  • python/cpython#20290
  • python/cpython#20391
  • python/cpython#20429
  • python/cpython#20610
  • python/cpython#21262
  • python/cpython#21433
  • python/cpython#23366
  • python/cpython#23375
  • python/cpython#26493
  • python/cpython#26550
  • python/cpython#26596
  • python/cpython#28128
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['expert-C-API', '3.11'] title = '[C API] Avoid accessing PyObject and PyVarObject members directly: add Py_SET_TYPE() and Py_IS_TYPE(), disallow Py_TYPE(obj)=type' updated_at = user = 'https://github.com/vstinner' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'none' closed = True closed_date = closer = 'vstinner' components = ['C API'] creation = creator = 'vstinner' dependencies = [] files = [] hgrepos = [] issue_num = 39573 keywords = ['patch'] message_count = 96.0 messages = ['361513', '361514', '361515', '361516', '361517', '361518', '361519', '361522', '361523', '361526', '361527', '361529', '361531', '361540', '361549', '361555', '361557', '361590', '361593', '361607', '361611', '361626', '361631', '361639', '361904', '361960', '361961', '361963', '361964', '361965', '361971', '361977', '361987', '361988', '362033', '362034', '362133', '362134', '362166', '362212', '362216', '362445', '363345', '363494', '363564', '365690', '366473', '366493', '368047', '369896', '369898', '370074', '370303', '370638', '370663', '370665', '370666', '370671', '370729', '370902', '370932', '372308', '373460', '379675', '379679', '379680', '379757', '379759', '381337', '381345', '381365', '381374', '381403', '381404', '382260', '382534', '382539', '382780', '382781', '382783', '394954', '394971', '395018', '395205', '395206', '395287', '395323', '395536', '401365', '401370', '401378', '401395', '401396', '401399', '403252', '410995'] nosy_count = 1.0 nosy_names = ['vstinner'] pr_nums = ['18388', '18389', '18390', '18391', '18392', '18393', '18394', '18398', '18400', '18402', '18411', '18419', '18488', '18496', '18507', '18508', '18521', '18601', '18789', '18798', '18799', '18804', '18809', '19882', '19975', '20290', '20391', '20429', '20610', '21262', '21433', '23366', '23375', '26493', '26550', '26596', '28128'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue39573' versions = ['Python 3.11'] ```

    Linked PRs

    vstinner commented 4 years ago

    Today, CPython is leaking too many implementation through its public C API. We cannot easily change the "default" C API, but we can enhance the "limited" C API (when Py_LIMITED_API macro is defined). Example of leaking implementation details: memory allocator, garbage collector, structure layouts, etc.

    Making PyObject an opaque structure would allow in the long term of modify structures to implement more efficient types (ex: list specialized for small integers), and it can prepare CPython to experiment tagged pointers.

    Longer rationale:

    I propose to incremental evolve the existing limited C API towards opaque PyObject, by trying to reduce the risk of breakage.

    We may test changes on PyQt which uses the limited C API.

    Another idea would be to convert some C extensions of the standard library to the limited C API. It would ensure that the limited C API contains enough functions to be useful, but would also notify us directly if the API is broken.

    vstinner commented 4 years ago

    Another idea would be to convert some C extensions of the standard library to the limited C API. It would ensure that the limited C API contains enough functions to be useful, but would also notify us directly if the API is broken.

    First issues that I met when I tried that:

    vstinner commented 4 years ago

    New changeset a93c51e3a8e15f1a486d11d5b55a64f3381babe0 by Victor Stinner in branch 'master': bpo-39573: Use Py_REFCNT() macro (GH-18388) https://github.com/python/cpython/commit/a93c51e3a8e15f1a486d11d5b55a64f3381babe0

    vstinner commented 4 years ago

    it can prepare CPython to experiment tagged pointers

    In September 2018, Neil Schemenauer did an experiment:

    More recent discussion on the capi-sig list:

    https://mail.python.org/archives/list/capi-sig@python.org/thread/JPUNPN3AILGXOA3C2TTSLMOFNSWJE3QX/

    See also my notes: https://pythoncapi.readthedocs.io/optimization_ideas.html#tagged-pointers-doable

    Wikipedia article: https://en.wikipedia.org/wiki/Tagged_pointer

    vstinner commented 4 years ago

    In the limited C API, Py_REFCNT() should be converted to:

    static inline Py_ssize_t _Py_REFCNT(const PyObject *ob)
    { return ob->ob_refcnt; }
    #define Py_REFCNT(ob) _Py_REFCNT(_PyObject_CAST(ob))

    It would enforce the usage of newly added Py_SET_REFCNT() (PR 18389) and advertise that the object is not modified (const).

    That would only be the first step towards a really opaque Py_REFCNT() function.

    vstinner commented 4 years ago

    TODO: Add Py_IS_TYPE() macro:

    #define Py_IS_TYPE(ob, tp) (Py_TYPE(ob) == (tp)) 

    For example, replace:

        #define PyBool_Check(x) (Py_TYPE(x) == &PyBool_Type) 

    with:

        #define PyBool_Check(x) Py_IS_TYPE(x, &PyBool_Type)

    IMHO it makes the code more readable.

    https://github.com/nascheme/cpython/commit/c156300592dc1eab234b74ed5b7cc90a020ab82b

    vstinner commented 4 years ago

    New changeset c86a11221df7e37da389f9c6ce6e47ea22dc44ff by Victor Stinner in branch 'master': bpo-39573: Add Py_SET_REFCNT() function (GH-18389) https://github.com/python/cpython/commit/c86a11221df7e37da389f9c6ce6e47ea22dc44ff

    vstinner commented 4 years ago

    New changeset 0d76d2bd28ac815dabae8b07240ed002ac8fce2d by Victor Stinner in branch 'master': bpo-39573: Use Py_TYPE() in abstract.c (GH-18390) https://github.com/python/cpython/commit/0d76d2bd28ac815dabae8b07240ed002ac8fce2d

    vstinner commented 4 years ago

    Py_TYPE() is commonly used to render the type name in an error message. Example:

    PyErr_Format(PyExc_TypeError,
                 "cannot convert '%.200s' object to bytearray",
                 Py_TYPE(arg)->tp_name);

    This code has multiple issues:

    In September 2018, I created bpo-34595: "PyUnicode_FromFormat(): add %T format for an object type name". But there was disagreement, so I rejected my change.

    I started "bpo-34595: How to format a type name?" thread on python-dev:

    I didn't continue this work (until now), since it wasn't my priority.

    vstinner commented 4 years ago

    New changeset a102ed7d2f0e7e05438f14d5fb72ca0358602249 by Victor Stinner in branch 'master': bpo-39573: Use Py_TYPE() macro in Python and Include directories (GH-18391) https://github.com/python/cpython/commit/a102ed7d2f0e7e05438f14d5fb72ca0358602249

    vstinner commented 4 years ago

    Make PyObject an opaque structure is also a first step towards the more ambitious project "HPy" project which is fully opaque: https://github.com/pyhandle/hpy

    This API is written from scratch and currently implemented on top on the existing C API.

    The following article is a nice introduction to the overall idea: https://morepypy.blogspot.com/2019/12/hpy-kick-off-sprint-report.html

    From my point of view, the long term goal would be to get better performance on PyPy and having a single API for C extension which would be efficient on all Python implementations (not only CPython).

    Currently, the C API is not only a performance issue to run C extensions on PyPy. It's also an issue in CPython. Because the C API leaks too many implementation details, we cannot experiment optimizations.

    See also: https://pythoncapi.readthedocs.io/rationale.html

    vstinner commented 4 years ago

    New changeset 58ac700fb09497df14d4492b6f820109490b2b88 by Victor Stinner in branch 'master': bpo-39573: Use Py_TYPE() macro in Objects directory (GH-18392) https://github.com/python/cpython/commit/58ac700fb09497df14d4492b6f820109490b2b88

    vstinner commented 4 years ago

    New changeset daa9756cb6395323d6f291efe5c7d7fdc6b2e9d8 by Victor Stinner in branch 'master': bpo-39573: Use Py_TYPE() macro in Modules directory (GH-18393) https://github.com/python/cpython/commit/daa9756cb6395323d6f291efe5c7d7fdc6b2e9d8

    vstinner commented 4 years ago

    New changeset d2ec81a8c99796b51fb8c49b77a7fe369863226f by Victor Stinner in branch 'master': bpo-39573: Add Py_SET_TYPE() function (GH-18394) https://github.com/python/cpython/commit/d2ec81a8c99796b51fb8c49b77a7fe369863226f

    vstinner commented 4 years ago

    To make PyObject opaque, we would have to convert Py_INCREF() and Py_DECREF() to opaque function calls. Example:

    #define Py_XINCREF(op) Py_IncRef(op)
    #define Py_XDECREF(op) Py_DecRef(op)

    Benchmarks should be run to measure to overhead and balance the advantages and drawbacks.

    vstinner commented 4 years ago

    Would a Py_TYPE_IS() macro help code readability?

    For example:
        #define Future_CheckExact(obj) (Py_TYPE(obj) == &FutureType)
    would become:
        #define Future_CheckExact(obj) (Py_TYPE_IS(obj, &FutureType))

    Py_TYPE_IS() would be more efficient for tagged pointers.

    I'm not sure about the macro name. Neil used Py_IS_TYPE(obj, type).

    Note: Py_TYPE_EQ(obj, type) name sounds confusing since the first parameter is an object, whereas the second one is a type.

    vstinner commented 4 years ago

    New changeset c65b320a95784d2b2133926921d67ac439259e9f by Victor Stinner in branch 'master': bpo-39573: Use Py_TYPE() macro in object.c (GH-18398) https://github.com/python/cpython/commit/c65b320a95784d2b2133926921d67ac439259e9f

    vstinner commented 4 years ago

    New changeset b10dc3e7a11fcdb97e285882eba6da92594f90f9 by Victor Stinner in branch 'master': bpo-39573: Add Py_SET_SIZE() function (GH-18400) https://github.com/python/cpython/commit/b10dc3e7a11fcdb97e285882eba6da92594f90f9

    serhiy-storchaka commented 4 years ago

    You have merged so much PRs today. What they do?

    PyObject cannot just be made an opaque structure. The user code reads and writes its fields directly and via macros. This change would break working code.

    We can encourage the user code to prepare to making PyObject an opaque structure. We need to provide a stable C API for access of PyObject fields for this. Note that there is a performance penalty of using functions instead of direct access, so you should have very good reasons to do this.

    vstinner commented 4 years ago

    You have merged so much PRs today. What they do?

    I merged changes which prepares CPython code base to make PyObject opaque. I only merged changes which should have no impact on performance, but prepare the API to make the structure opaque.

    Right now, Py_SET_REFNCT() stills access directly to PyObject.ob_refcnt. But it becomes possible to make Py_SET_REFNCT() an opaque function call.

    Do you see any issue with the changes that I already merged? Using PGO+LTO, static inline functions should be as efficient as the previous code using Py_REFCNT() & cie macros.

    PyObject cannot just be made an opaque structure. The user code reads and writes its fields directly and via macros. This change would break working code.

    I'm trying to modifying the limited C API to make it possible: all access to PyObject fields should go through macros or function calls. The question is now how which fields are accessed and how.

    We can encourage the user code to prepare to making PyObject an opaque structure. We need to provide a stable C API for access of PyObject fields for this.

    For the short term, I don't plan to make PyObject opaque, so I don't plan to enforce usage of Py_TYPE(), Py_SET_REFCNT(), etc.

    Note that there is a performance penalty of using functions instead of direct access, so you should have very good reasons to do this.

    Yeah, replacing Py_REFCNT() macro with an opaque function call is likely to have an impact on performance. It should be properly measure, I'm well aware of that, I already wrote it in a previous comment ;-) I don't plan to push change such right now. And I will wait for the review of my peers (like you) for such change ;-)

    vstinner commented 4 years ago

    New changeset 60ac6ed5579f6666130fc264d3b748ee9575e3aa by Victor Stinner in branch 'master': bpo-39573: Use Py_SET_SIZE() function (GH-18402) https://github.com/python/cpython/commit/60ac6ed5579f6666130fc264d3b748ee9575e3aa

    zooba commented 4 years ago

    "static inline" functions are not opaque - as they get inlined into 3rd-party compiled code, we can't change anything they reference, and so the structure layout is still fixed and has to be visible to the user's compiler.

    I'm not totally against the changes, but it's worth pointing out that you aren't achieving what the issue title claims, so it's really just code cleanliness (and/or introducing macro-users to static inline functions ;) ).

    vstinner commented 4 years ago

    "static inline" functions are not opaque

    I'm well aware of that :-) But once the CPython code base will stop accessing directly PyObject fields directly, it would become possible to experiment changing PyObject layout, at least testing it in CPython.

    First changes are just to prepare the code base to experiment the real change. But as Serhiy pointed out, the second part will have an impact on performance and so should be carefully benchmarked to balance advantages and drawbacks, even if it's only done in the limited C API.

    vstinner commented 4 years ago

    New changeset 7f6f7eef5206858030cbe4f80a7c04b02781cc9a by Dong-hee Na in branch 'master': bpo-39573: Use Py_TYPE() macro in ctypes.h (GH-18411) https://github.com/python/cpython/commit/7f6f7eef5206858030cbe4f80a7c04b02781cc9a

    corona10 commented 4 years ago

    FYI, I am working on to add Py_IS_TYPE macro. :)

    shihai1991 commented 4 years ago

    Hi, guys. Is there value in adding PyNone_Check macro?(_PyNone_Type is not esposed to CAPI directly, so I am not sure about it) If the answer is 'yes', i can add it ;)

    vstinner commented 4 years ago

    Hi, guys. Is there value in adding PyNone_Check macro?

    "obj == Py_None" is a very common pattern.

    You have check how it is done in HPy: https://github.com/pyhandle/hpy

    See also bpo-39511: "[subinterpreters] Per-interpreter singletons (None, True, False, etc.)".

    shihai1991 commented 4 years ago

    "obj == Py_None" is a very common pattern. You have check how it is done in HPy: https://github.com/pyhandle/hpy See also bpo-39511: "[subinterpreters] Per-interpreter singletons (None, >True, False, etc.)".

    Thanks, I will check it.

    vstinner commented 4 years ago

    New changeset 968dcd9e7a4d3aa9aaa1dfca693adf60d6b71ce7 by Brandt Bucher in branch 'master': bpo-39573: Fix bad copy-paste in Py_SET_SIZE (GH-18496) https://github.com/python/cpython/commit/968dcd9e7a4d3aa9aaa1dfca693adf60d6b71ce7

    vstinner commented 4 years ago

    New changeset d905df766c367c350f20c46ccd99d4da19ed57d8 by Dong-hee Na in branch 'master': bpo-39573: Add Py_IS_TYPE() function (GH-18488) https://github.com/python/cpython/commit/d905df766c367c350f20c46ccd99d4da19ed57d8

    vstinner commented 4 years ago

    Hi, guys.

    By the way, please find another more inclusive way to say hi, see: https://heyguys.cc/

    shihai1991 commented 4 years ago

    By the way, please find another more inclusive way to say hi, see: https://heyguys.cc/

    Oh, copy that. Sorry for my poor english.

    vstinner commented 4 years ago

    New changeset d212c3c55d414203b0579e000d9f340f8cd11be7 by Dong-hee Na in branch 'master': bpo-39573: PyXXX_Check() macros use Py_IS_TYPE() (GH-18508) https://github.com/python/cpython/commit/d212c3c55d414203b0579e000d9f340f8cd11be7

    vstinner commented 4 years ago

    New changeset 9aeb0ef9309384099e2f23bcee2240fbc096568e by Dong-hee Na in branch 'master': bpo-39573: Update clinic to use Py_IS_TYPE() function (GH-18507) https://github.com/python/cpython/commit/9aeb0ef9309384099e2f23bcee2240fbc096568e

    a6e54279-29a2-4197-b414-54af27f54591 commented 4 years ago

    @vstinner would it be helpful if I went on a sweep looking for places we can use the new Py_IS_TYPE macro?

    Getting away from Py_TYPE(op) would also mean a move to making the internals const-correct.

    a6e54279-29a2-4197-b414-54af27f54591 commented 4 years ago

    I'm hoping that a goal here is to make

    static inline int _Py_IS_TYPE(PyObject *ob, PyTypeObject *type)

    actually be

    static inline int _Py_IS_TYPE(const PyObject *ob, const PyTypeObject *type)
    vstinner commented 4 years ago

    Getting away from Py_TYPE(op) would also mean a move to making the internals const-correct.

    Would you mind to explain how it's an issue to modify PyObject* temporarily during a function call? It's common to increase the object reference count to ensure that it doesn't go even while we use it.

    vstinner commented 4 years ago

    New changeset 1b55b65638254aa78b005fbf0b71fb02499f1852 by Dong-hee Na in branch 'master': bpo-39573: Clean up modules and headers to use Py_IS_TYPE() function (GH-18521) https://github.com/python/cpython/commit/1b55b65638254aa78b005fbf0b71fb02499f1852

    a6e54279-29a2-4197-b414-54af27f54591 commented 4 years ago

    Would you mind to explain how it's an issue to modify PyObject* temporarily during a function call?

    It's not a problem to modify the PyObject during a function call. However, many functions don't need to modify the object, but are still taking non-const PyObject arguments.

    For example if I have this code:

        if (Py_TYPE(deque) == &deque_type) {

    That doesn't modify deque to check the type, but because Py_TYPE casts away the constness, deque can't be a const object.

    However, with the new Py_IS_TYPE function:

        if (Py_IS_TYPE(deque, &deque_type)) {

    and these two changes:

    -static inline int _Py_IS_TYPE(PyObject *ob, PyTypeObject *type) {
    +static inline int _Py_IS_TYPE(const PyObject *ob, const PyTypeObject *type) {
         return ob->ob_type == type;
     }
    -#define Py_IS_TYPE(ob, type) _Py_IS_TYPE(_PyObject_CAST(ob), type)
    +#define Py_IS_TYPE(ob, type) _Py_IS_TYPE(((const PyObject*)(ob)), type)

    the deque variable can be const.

    Another example of a common pattern that I believe could benefit from this is Py_TYPE(ob)->tp_name. That could be turned into Py_TYPE_NAME(ob) and that would allow the ob to be a const pointer.

    If we can keep functions that don't modify the object to accept const PyObject* it will help make things safer in the long run.

    vstinner commented 4 years ago

    If we can keep functions that don't modify the object to accept const PyObject* it will help make things safer in the long run.

    In my experience, trying to add "const" is quite painful, since the "const" has to be propagated to all functions called by the modified function. Python never used "const" with "PyObject" because they are *so many functions which really modify objects on purpose.

    I don't see how adding "const" would help this issue "Make PyObject an opaque structure in the limited C API". If you consider that something should be changed, please open a *separated* issue.

    a6e54279-29a2-4197-b414-54af27f54591 commented 4 years ago

    All I'm saying is that I think Py_IS_TYPE is a great idea, and that Py_IS_TYPE should take const arguments, since its arguments are not modified. If you think that should go in a different ticket, then I can make that happen.

    a6e54279-29a2-4197-b414-54af27f54591 commented 4 years ago

    Just added a new PR to finish off the remaining places to use Py_IS_TYPE()

    https://github.com/python/cpython/pull/18601

    vstinner commented 4 years ago

    New changeset dffe4c07095e0c693e094d3c140e85a68bd8128e by Andy Lester in branch 'master': bpo-39573: Finish converting to new Py_IS_TYPE() macro (GH-18601) https://github.com/python/cpython/commit/dffe4c07095e0c693e094d3c140e85a68bd8128e

    vstinner commented 4 years ago

    New changeset 8767ce92d24d3687405848442e6c67cf0af1c657 by Andy Lester in branch 'master': bpo-39573: Make Py_IS_TYPE() take constant parameters (GH-18799) https://github.com/python/cpython/commit/8767ce92d24d3687405848442e6c67cf0af1c657

    vstinner commented 4 years ago

    New changeset 557287075c264d2458cd3e1b45e9b8ee5341e0a1 by Andy Lester in branch 'master': bpo-39573: Use Py_IS_TYPE() macro to check for types (GH-18809) https://github.com/python/cpython/commit/557287075c264d2458cd3e1b45e9b8ee5341e0a1

    vstinner commented 4 years ago

    I created bpo-40170 "[C API] Make PyTypeObject structure an opaque structure in the public C API".

    vstinner commented 4 years ago

    PyType_FromSpec() and PyType_Spec API are not currently compatible with opaque PyObject.

    Example: ---

    #define PyObject_HEAD    PyObject ob_base;
    
    typedef struct {
        PyObject_HEAD
        ...
    } MyObject;

    static PyType_Spec type_spec = { .name = "MyObject", .basicsize = sizeof(MyObject), ... };

    ... = PyType_FromSpec(&type_spec);

    sizeof(MyObject) requires to compute sizeof(PyObject).

    Issue reported by Ronald Oussoren on python-dev: https://mail.python.org/archives/list/python-dev@python.org/message/PGKRW7S2IUOWVRX6F7RT6VAWD3ZPUDYS/

    ronaldoussoren commented 4 years ago

    The incompatibility mentioned in msg366473 is probably fixable by treating the PyObject header the same as the GC head structure. With some care this could mostly maintain binary compatibility by inserting some unused fields in PyObject_HEAD instead of the PyObject header when an extension targets a stable ABI version that has the PyObject header in-line.

    This issue seems to be comparible to the "fragile instance variable" issue fixed in Objective-C 2.0, see \https://en.wikipedia.org/wiki/Objective-C#Non-fragile_instance_variables\. That was fixed by adding a level of indirection when accessing member variables.

    Something like that is probably necessary to be able to subclass builtin types (other than object itself) in an extension.

    corona10 commented 4 years ago

    New changeset 5e8ffe147710e449c2e935a4e2ff5cbd19828a8a by Hai Shi in branch 'master': bpo-39573: Use Py_IS_TYPE to check for types (GH-19882) https://github.com/python/cpython/commit/5e8ffe147710e449c2e935a4e2ff5cbd19828a8a

    corona10 commented 4 years ago

    New changeset ad3252bad905d41635bcbb4b76db30d570cf0087 by Dong-hee Na in branch 'master': bpo-39573: Convert Py_TYPE() to a static inline function (GH-20290) https://github.com/python/cpython/commit/ad3252bad905d41635bcbb4b76db30d570cf0087