python / cpython

The Python programming language
https://www.python.org
Other
62.83k stars 30.09k forks source link

[subinterpreters][C API] Add a new function to create PyStructSequence from Heap. #89276

Closed shihai1991 closed 2 years ago

shihai1991 commented 3 years ago
BPO 45113
Nosy @rhettinger, @vstinner, @tiran, @encukou, @pablogsal, @shihai1991, @erlend-aasland
PRs
  • python/cpython#28573
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['expert-C-API', '3.11'] title = '[subinterpreters][C API] Add a new function to create PyStructSequence from Heap.' updated_at = user = 'https://github.com/shihai1991' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'none' closed = True closed_date = closer = 'shihai1991' components = ['C API'] creation = creator = 'shihai1991' dependencies = [] files = [] hgrepos = [] issue_num = 45113 keywords = ['patch'] message_count = 27.0 messages = ['401129', '401131', '401132', '402158', '402293', '402606', '403728', '403887', '404306', '404492', '404750', '404753', '404756', '404787', '404795', '404863', '404865', '404866', '404869', '404872', '404875', '404892', '405043', '406408', '410926', '411957', '411958'] nosy_count = 7.0 nosy_names = ['rhettinger', 'vstinner', 'christian.heimes', 'petr.viktorin', 'pablogsal', 'shihai1991', 'erlendaasland'] pr_nums = ['28573'] priority = 'normal' resolution = 'rejected' stage = 'resolved' status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue45113' versions = ['Python 3.11'] ```

    shihai1991 commented 3 years ago

    Copied from https://bugs.python.org/issue40512#msg399847:

    Victor: PyStructSequence_InitType2() is not compatible with subinterpreters: it uses static types. Moreover, it allocates tp_members memory which is not released when the type is destroyed. But I'm not sure that the type is ever destroyed, since this API is designed for static types.

    PyStructSequence_InitType2() is not compatible with subinterpreters: it uses static types. Moreover, it allocates tp_members memory which is not released when the type is destroyed. But I'm not sure that the type is ever destroyed, since this API is designed for static types.

    IMO, I suggest to create a new function, PyStructSequence_FromModuleAndDesc(module, desc, flags) to create a heaptype and don't aloocates memory block for tp_members,something like 'PyType_FromModuleAndSpec()`.

    encukou commented 3 years ago

    I think you will run into issues with allocating tp_members, because there isn't a good mechanism to for type objects to manage C-level data.

    But I encourage you to try, so you get a better understanding of the problem :)

    shihai1991 commented 3 years ago

    But I encourage you to try, so you get a better understanding of the problem :)

    OK, thanks, Petr. I try to add this C API to see what will happen :)

    rhettinger commented 3 years ago

    Is the subinterpreters work still on hold pending a PEP?

    I thought that conversions to heap types had been suspended because there is a Steering Council level cost benefit decision. Mostly it seems that everything helps subinterpreters makes code worse in almost every other way (slower, more complex, new APIs, etc).

    encukou commented 3 years ago

    Subinterpreters are not the only reason to do this (and they're not *my* reason to do it).

    Adding a way to create PyStructSequence heap will help users of the stable ABI, where reduced performance is a known tradeoff (see https://www.python.org/dev/peps/pep-0652/#stable-abi: "The Stable ABI trades performance for its stability"). More generally, this would need solving one of the remaining limitations of the limited API (PEPs 489, 630): type-specific data. If Hai Shi solves the problem, the solution will be useful even if PyStructSequence_FromModuleAndDesc turns out useless.

    Using the proposed new API in CPython's stdlib should be done much more deliberately, and yes, would need a PEP.

    shihai1991 commented 3 years ago

    More generally, this would need solving one of the remaining limitations of the limited API (PEPs 489, 630): type-specific data. Agree. We can't track and destroy the memory block created from the heap now.

    encukou commented 2 years ago

    Oh! I assumed this bug wasn't resolved, but it is -- in bpo-34784. Sorry, I should have checked!

    The only thing the proposed PR adds is a way to set ht_module, which actually isn't very useful (it's used for module state access, but PyStructSequence_Desc doesn't allow you to add custom methods or anything else that would need the module state).

    I'd close this bpo as duplicate. And close the PR -- it's well done, but unfortunately, solves the wrong problem :(

    shihai1991 commented 2 years ago

    The only thing the proposed PR adds is a way to set ht_module, which actually isn't very useful (it's used for module state access.

    Hm, there have some static types. so I create the PyStructSequence_FromModuleAndDes() could receive tp_flags and try convert those static types to the heap type. eg: https://github.com/python/cpython/blob/main/Python/sysmodule.c#L2837-L2839 We can not do this converting operation by calling PyStructSequence_NewType(), no?

    This is my *reason* to create PR-28573 :)

    encukou commented 2 years ago

    Ah, sorry, I overlooked the flags. This does beg the question: what else from PyType_Spec will be needed? I guess we don't want to allow additional slots/methods. (Combining them would be hard anyway; if someone needs them they should make a subclass.) So it seems that flags and ht_module are all that's missing.

    Also, note that converting the stdlib to heap types is suspended, pending a PEP. I'd be much happier with adding this if some *other* project needed it. The Py_TPFLAGS_DISALLOW_INSTANTIATION is very specific; do you have other examples that need this?

    shihai1991 commented 2 years ago

    Ah, sorry, I overlooked the flags. It's Okay.

    This does beg the question: what else from PyType_Spec will be needed? I guess we don't want to allow additional slots/methods. +1.

    Also, note that converting the stdlib to heap types is suspended, pending a PEP. Can I do something for this pending PEP?

    The Py_TPFLAGS_DISALLOW_INSTANTIATION is very specific; do you have other examples that need this? I only find this bpo-44532 :( I guess the outside user will use it MAYBE.

    encukou commented 2 years ago

    Can I do something for this pending PEP?

    Ask Victor, he should know more. But as far as I know, no one started on it yet.

    I guess the outside user will use it MAYBE.

    Py_TPFLAGS_DISALLOW_INSTANTIATION is not part of the limited API, so it would be nice to find another use case.

    erlend-aasland commented 2 years ago

    FYI, see bpo-43916 regarding the introduction and rationale of introducing Py_TPFLAGS_DISALLOW_INSTANTIATION.

    Slightly related: bpo-43908

    erlend-aasland commented 2 years ago

    > Can I do something for this pending PEP?

    Ask Victor, he should know more. But as far as I know, no one started on it yet.

    Quoting PEP-630 (active PEP):

    Whenever this PEP mentions extension modules, the advice also applies to built-in modules, such as the C parts of the standard library. The standard library is expected to switch to per-module state early.

    Why is not PEP-630 enough? (I find these "political" discussions a bit hard to follow sometimes :)

    rhettinger commented 2 years ago

    Why is not PEP-630 enough?

    My understanding is that this entire class of code changes has been put on hold pending Steering Council approval. It represents a great deal of code churn and creation on new APIs. Several of the patches had had negative performance impacts and AFAICT all of the patches have made the code more complicated and harder to maintain. Several that I looked at had no tests at all, so there was no discernible or provabler user benefit. Also the code being affected is typically some of our most stable code that hasn't had to be changed in over a decade. IIRC one or more of the patches created new bugs, were destabilizing, or altered the APIs in subtle ways.

    Please get explicit approval from the SC before continuing to sweep through the code base, creating heap types everywhere. Given that some essential third party modules are not going down this path, it is unlikely that general users would ever see any benefit.

    erlend-aasland commented 2 years ago

    My understanding is that this entire class of code changes has been put on hold pending Steering Council approval.

    Can you please point me to the official SC statement regarding this? I cannot find it.

    It represents a great deal of code churn and creation on new APIs.

    You cannot implement [the accepted] PEP-630 without a little bit of code churn ;) Luckily, (most of) the APIs needed are defined in PEP-573 (also accepted, now in final state).

    Several of the patches had had negative performance impacts [...]

    Can you provide a list of the patches/bpo's with still unresolved performance impacts?

    [...] all of the patches have made the code more complicated and harder to maintain

    I have not seen other complaints about this? Can you provide a link to a bpo or a Discourse topic? If you find the new APIs hard to use or understand, we can of course improve the documentation and the dev guide :)

    Several that I looked at had no tests at all [...]

    Actually, we've added tests to heap types with Py_TPFLAGS_IMMUTABLE_TYPE and Py_TPFLAGS_DISALLOW_INSTANTIATION. Apart from that; if a type lives on the stack or on the heap should be transparent for the user, right? The existing test suite will help verify exactly that.

    [...] so there was no discernible or provabler user benefit.

    The benefits are described in PEP-630 ;)

    IIRC one or more of the patches created new bugs, were destabilizing, or altered the APIs in subtle ways.

    There has been cases with ref-leaks, but that happens from time to time in all types of PRs, not just heap type PRs. Those have all been caught and fixed, thanks to reviewers and ref-leak-bots.

    I'm not sure what you mean with 'destabilizing'? Can you point to a specific example or issue?

    Regarding APIs, there was a long discussion about immutability and disallowing instantiation during the 3.10 beta period. Those issues were fixed and unblocked.

    Please get explicit approval from the SC before continuing to sweep through the code base, creating heap types everywhere.

    Strictly speaking, the fact that PEP-630 is accepted and active _is_ an explicit approval. It has not been withdrawn or rejected.

    Given that some essential third party modules are not going down this path [...]

    Which ones? Can you provide a link?

    [...] it is unlikely that general users would ever see any benefit.

    The benefits are explained in PEP-630. It is a well written PEP; I suggest you re-read it :)

    encukou commented 2 years ago

    PEP-640 is Informational, and per PEP-1:

    Informational PEPs do not necessarily represent a Python community consensus or recommendation, so users and implementers are free to ignore Informational PEPs or follow their advice.

    (Will reply more tomorrow, I'm on my phone now) -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

    erlend-aasland commented 2 years ago

    Thanks, Petr. Yes, that is correct. PEP-630 (I assume you meant 630, not 640) still describes the _rationale_ very well. I guess what's needed is a Standards Track PEP based on PEP-630 (IIUC).

    encukou commented 2 years ago

    Quoting PEP-630 (active PEP):

    Whenever this PEP mentions extension modules, the advice also applies to built-in modules, such as the C parts of the standard library. The standard library is expected to switch to per-module state early.

    Maybe I should remove that part, I didn't expect applying this to stdlib would be so fast/rushed.

    Can you please point me to the official SC statement regarding this? I cannot find it.

    Hmm, the best I can find is https://github.com/python/steering-council/blob/main/updates/2021-02-steering-council-update.md#february-08 It asks for no changes in 3.10 beta (no longer relevant) and a PEP -- presumably one specifically for the stdlib; PEP-630 already existed.

    PEP-630 works best for new code; applying it to stdlib needs additional considerations, e.g. performance impact analysis, and preserving immutability or (un)pickleability, which certainly wasn't always tested well and caused much trouble late tn the release cycle.

    Please get explicit approval from the SC before continuing to sweep through the code base, creating heap types everywhere.

    That would be nice (but myself, I won't be championing that effort any time soon).

    Given that some essential third party modules are not going down this path

    OTOH, others (like cryptography & Qt) are going down this path, so I'll continue improving the support.

    tiran commented 2 years ago

    Given that some essential third party modules are not going down this path, it is unlikely that general users would ever see any benefit.

    I disagree with your statement. It does not reflect my experience, too.

    Heap types are a prerequisite for abi3 wheels. In order to get abi3 binary wheels, extensions must use the limited API, which means they must use heap types. Stable ABI wheels are a great benefit for extension authors, because it reduces their build matrix substantially. For instance cryptography's stable abi3 wheels work with any Python version >=3.6,\<4.

    Heap types and stable abi3 are also a major benefit for general users. Once Cython and the NumPy stack supports limited API and stable ABI, users no longer have to wait on builds for a new Python release. Every new minor release we get several bug reports on BPO about NumPy not working on latest release. This will go away.

    erlend-aasland commented 2 years ago

    Petr:

    Hmm, the best I can find is https://github.com/python/steering-council/blob/main/updates/2021-02-steering-council-update.md#february-08

    Perfect, thanks!

    Christian: Thanks for your input. If this was a discussion on Discourse, I'd press the L key on your message :)

    erlend-aasland commented 2 years ago

    Oh, I found PEP-3121 (Extension Module Initialization and Finalization). It is a Standards Track PEP and it is accepted.

    The abstract is pretty short. Let me just repost it here, for convenience:

    Extension module initialization currently has a few deficiencies. There is no cleanup for modules, the entry point name might give naming conflicts, the entry functions don't follow the usual calling convention, and multiple interpreters are not supported well. This PEP addresses these issues.

    Quoting from the Specification section of that PEP:

    The initialization routine will be invoked once per interpreter, when the module is imported. It should return a new module object each time.

    In order to store per-module state in C variables, each module object will contain a block of memory that is interpreted only by the module. The amount of memory used for the module is specified at the point of creation of the module.

    [...]

    As all Python objects should be controlled through the Python memory management, usage of "static" type objects is discouraged, unless the type object itself has no memory-managed state.

    PEP-3121 is not withdrawn; PEP-630 is not withdrawn. What is then expected of a new PEP? Or the other way around: what is missing from those two PEPs? AFAICT, PEP wise, everything is arranged and ready. Perhaps the SC can chime in and explain why another PEP is required? :)

    BTW, the SC has actually asked for another _Informational_ PEP, not a Standards Track PEP, so I guess a new PEP will be very similar to PEP-630?

    shihai1991 commented 2 years ago

    PEP-3121 is not withdrawn; PEP-630 is not withdrawn. What is then expected of a new PEP? Or the other way around: what is missing from those two PEPs?

    Thanks Erlend for the relative information you provided. AFAIK, the stdlib is not the extension module strictly :)

    encukou commented 2 years ago

    PEP-630 has motivations and technical notes. What needs to be documented better is how both applies to stdlib. Specifically:

    Whether that should be a new PEP or added to 630 doesn't matter much, IMO.

    Then, any introduction of PEP-630 should be done deliberately, analyzing the pros and cons for each module separately. It should not be a mass change, and it should involve explaining the motivations & specific implications to module maintainers.

    Since this is out of scope in this issue (and any other one on bpo, as far as I know), I invite discussion over at https://github.com/encukou/abi3/issues/21

    encukou commented 2 years ago

    Back to this issue -- do we have any use case other than setting the internal Py_TPFLAGS_DISALLOW_INSTANTIATION flag?

    If not, I'd like to close this (with apologies for not doing my research and letting Hai Shi do unmerged work). If a use case is found, I suspect it'll need a different solution – perhaps allowing PyType_Slot·s.

    shihai1991 commented 2 years ago

    If not, I'd like to close this (with apologies for not doing my research and letting Hai Shi do unmerged work). If a use case is found, I suspect it'll need a different solution – perhaps allowing PyType_Slot·s

    OK, I close this bpo. We can reopen or create a new bpo if we find a useful user case. Thank you all for joining this bpo :)

    vstinner commented 2 years ago

    Quick update on this closed issue.

    I landed on this issue from Erlend's SC request: https://github.com/python/steering-council/issues/99

    Oh! I assumed this bug wasn't resolved, but it is -- in bpo-34784. Sorry, I should have checked!

    Yes, there is the PyStructSequence_NewType() function which exists since 2010 (PEP-384 implementation: commit 4d0d471a8031de90a2b1ce99c4ac4780e60b3bc9).

    I recently added _PyStructSequence_NewType() to the internal C API to pass flags like Py_TPFLAGS_DISALLOW_INSTANTIATION. I modified signal, _curses, time and _thread modules to use it.

    On the other side, I added _PyStructSequence_FiniType() to clear a static structseq (bpo-46417).

    vstinner commented 2 years ago

    Raymond:

    Is the subinterpreters work still on hold pending a PEP?

    While sub-interpreters is one motivation to convert static types to heap types, it's not the only one.

    IMO it's better to use the "embedded Python" use case to justify these changes, since embedding Python is non controversial and is already widely used.

    For me, another motivation (related to embedded Python) is to release all memory in Py_Finalized(): see bpo-1635741. While this main issues is now closed, Python still leaks memory when you import C extensions using static types: see bpo-40077.

    --

    Hai Shi proposed to add a PyStructSequence_FromModuleAndDesc() function but right now, I'm not sure which C extension requires to retrieve a module state from a structseq object. Usually, structseq types have no method. So I don't see the need to reopen the issue.

    I just wanted to comment this closed issue ;-)