python / cpython

The Python programming language
https://www.python.org
Other
62.59k stars 30.04k forks source link

Calling Py_Initialize() twice now triggers a fatal error (Python 3.7) #78113

Closed vstinner closed 6 years ago

vstinner commented 6 years ago
BPO 33932
Nosy @ncoghlan, @vstinner, @ned-deily, @ericsnowcurrently, @zhangyangyu, @emilyemorehouse, @hroncok, @miss-islington
PRs
  • python/cpython#7845
  • python/cpython#7859
  • python/cpython#7967
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core', 'easy', 'type-bug', '3.7'] title = 'Calling Py_Initialize() twice now triggers a fatal error (Python 3.7)' updated_at = user = 'https://github.com/vstinner' ``` bugs.python.org fields: ```python activity = actor = 'ncoghlan' assignee = 'docs@python' closed = True closed_date = closer = 'ncoghlan' components = ['Interpreter Core'] creation = creator = 'vstinner' dependencies = [] files = [] hgrepos = [] issue_num = 33932 keywords = ['patch', 'easy'] message_count = 26.0 messages = ['320175', '320180', '320182', '320183', '320184', '320220', '320223', '320225', '320226', '320227', '320244', '320248', '320255', '320562', '320578', '320579', '320582', '320634', '320641', '320665', '320666', '320667', '320725', '320755', '320759', '320761'] nosy_count = 9.0 nosy_names = ['ncoghlan', 'vstinner', 'ned.deily', 'docs@python', 'eric.snow', 'xiang.zhang', 'emilyemorehouse', 'hroncok', 'miss-islington'] pr_nums = ['7845', '7859', '7967'] priority = None resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue33932' versions = ['Python 3.7'] ```

    vstinner commented 6 years ago

    It would help to document that calling Py_Initialize() twice in Python 3.7 now triggers a fatal error, whereas previous the second call did nothing. Document it at: https://docs.python.org/dev/whatsnew/3.7.html#changes-in-the-c-api

    ... Or should it be considered as a regression?

    --

    My colleague Miro Hrončok reported a Python crash (SIGABRT) when running https://github.com/konlpy/konlpy test suite on Python 3.7:

    "Fatal Python error: _Py_InitializeCore: main interpreter already initialized"

    konlpy uses the JPype project, the bug is in JPype initialization function (it's a C extension):

    vstinner commented 6 years ago

    I set the priority to release blocker to make sure that someone looks at this issue, even if I'm not sure that it's a huge regression.

    vstinner commented 6 years ago

    The behaviour changed has been introduced by bpo-22257:

    commit 1abcf6700b4da6207fe859de40c6c1bada6b4fec Author: Eric Snow \ericsnowcurrently@gmail.com\ Date: Tue May 23 21:46:51 2017 -0700

    bpo-22257: Private C-API for core runtime initialization (PEP-432). (bpo-1772)
    
    (patch by Nick Coghlan)

    Simplified change:

    + if (_Py_Initialized) { + Py_FatalError("Py_InitializeCore: main interpreter already initialized"); + }

    vstinner commented 6 years ago

    Attached PR 7845 restores the Python 3.6 behaviour: calling Py_Initialize() twice does nothing.

    vstinner commented 6 years ago

    Note: the doc doesn't say that the function must only be called once, nor document the (current) new Python 3.7 restriction (must only be called once): https://docs.python.org/dev/c-api/init.html#c.Py_Initialize

    zhangyangyu commented 6 years ago

    Note: the doc doesn't say that the function must only be called once, nor document the (current) new Python 3.7 restriction (must only be called once):

    But the doc says:

    This is a no-op when called for a second time (without calling Py_FinalizeEx() first).

    no-op means it could be called twice?

    vstinner commented 6 years ago

    "no-op means it could be called twice" oops right :-) So it's a regression. Would you mind to review my PR 7845, Xiang?

    ncoghlan commented 6 years ago

    While I agree the documentation issue means that this should be made to work again for Python 3.7.1, I'd like for Python 3.7 to at least deprecate calling Py_Initialize twice without an intervening Py_Finalize.

    Otherwise the public multi-phase interpreter initialization API is going to have to cope with somebody starting the initialization process, and then calling a full Py_Initialize to finish it (instead of the new "complete initialization" API, whatever we end up calling that), and I'd prefer to keep the permitted state transitions more linear than that.

    vstinner commented 6 years ago

    While I agree the documentation issue means that this should be made to work again for Python 3.7.1, I'd like for Python 3.7 to at least deprecate calling Py_Initialize twice without an intervening Py_Finalize.

    In term of timeline, I would prefer to fix the regression right now, and then see how to deprecate calling the function later :-)

    ncoghlan commented 6 years ago

    Aye, I'm fine with deferring the programmatic deprecation for now - it's mainly the docs I'd like to see changed at the same time as the regression is fixed.

    vstinner commented 6 years ago

    New changeset 209abf746985526bce255e2fba97d3246924885d by Victor Stinner in branch 'master': bpo-33932: Calling Py_Initialize() twice does nothing (GH-7845) https://github.com/python/cpython/commit/209abf746985526bce255e2fba97d3246924885d

    miss-islington commented 6 years ago

    New changeset 3747dd16d5d2af3499f586386e49740a0454cf44 by Miss Islington (bot) in branch '3.7': bpo-33932: Calling Py_Initialize() twice does nothing (GH-7845) https://github.com/python/cpython/commit/3747dd16d5d2af3499f586386e49740a0454cf44

    ned-deily commented 6 years ago

    Since the undocumented change in behavior from 3.6 to 3.7 appears to be causing some downstream regressions at this late stage, I think Victor's change should be cherry-picked for 3.7.0 final.

    Nick or Eric, if you merge a 3.7 doc change, I'll cherry-pick that as well.

    vstinner commented 6 years ago

    Another example of project affected by this regression: fontforge https://bugzilla.redhat.com/show_bug.cgi?id=1595421

    ncoghlan commented 6 years ago

    https://github.com/python/cpython/pull/7967 amends the docs and adds the deprecation notices, but after taking another look at the way Victor's patch works, I'm thinking we may not need them: the new initialization API remains strict about the required state machine transitions, which means that calling Py_Initialize when only the core has been initialized will fail at the _Py_InitializeCore step (as that complains if the core has already been initialized).

    Similarly, calling _Py_InitializeCore after Py_Initialize still fails, as the *only case that becomes a no-op is specifically Py_InitializeEx, *after the interpreter is fully initialized.

    Given that, I think we can just keep Victor's compatibility fix indefinitely, and only have the new incremental initialization APIs be strict about the expected transitions through the state machine.

    ncoghlan commented 6 years ago

    I went ahead and closed my PR, as I'm now happy that keeping this behaviour isn't going to block anything we want to do for PEP-432.

    If we change our minds and decide we want to deprecate the current behaviour after all, then we can start that deprecation process in Python 3.8.

    vstinner commented 6 years ago

    Please keep it open until 3.7.0 is released.

    ned-deily commented 6 years ago

    New changeset b940921a67fa33e8c8812aa3b7746161c2552c1d by Ned Deily (Miss Islington (bot)) in branch '3.7': bpo-33932: Calling Py_Initialize() twice does nothing (GH-7845) https://github.com/python/cpython/commit/b940921a67fa33e8c8812aa3b7746161c2552c1d

    ned-deily commented 6 years ago

    Fixed in 3.7.0. Thanks, everyone!

    vstinner commented 6 years ago

    I reopen the issue because fontforge still fails on Python 3.7.0 final. fontforge calls Py_Initialize() and then calls Py_Main() which fails with an assertion: https://bugzilla.redhat.com/show_bug.cgi?id=1595421

    The same code works well on Python 3.6, so should we define this issue as a regression?

    Call stack:

    The problem is that the first call of Py_Initialize() sets a first configuration, but then Py_Main() sets a different configuration.

    Is it correct to call Py_Initialize() before Py_Main()?

    vstinner commented 6 years ago

    I looked quickly at _Py_InitializeCore() and I'm not sure that it's designed to replace an existing configuration.

    Example:

    int Py_HashRandomizationFlag = 0; / for -R and PYTHONHASHSEED \/

    (...)

        if (!core_config->use_hash_seed || core_config->hash_seed) {
            /* Random or non-zero hash seed */
            Py_HashRandomizationFlag = 1;
        }

    If Py_Initialize() sets Py_HashRandomizationFlag to 1, but Py_Main() doesn't use an hash seed, Py_HashRandomizationFlag value remains 1 which is wrong.

    pymain_read_conf() fills _PyCoreConfig from global configuration flags, but then global configuration flags are supposed to be set from the global configuration flags.

    So maybe for this specific case, _Py_InitializeCore() should always set Py_HashRandomizationFlag. But it was just one example to show that right now, _Py_InitializeCore() doesn't seem to support to be called twice with two different configurations.

    vstinner commented 6 years ago

    Ok, now with my regression hat.

    In Python 3.6, Py_Main() calls Py_Initialize(). If Py_Initialize() is called twice, the second call does nothing. So it's fine to call Py_Main() after Py_Initialize() in Python 3.6.

    ericsnowcurrently commented 6 years ago

    I looked quickly at _Py_InitializeCore() and I'm not sure that it's designed to replace an existing configuration. ... So maybe for this specific case, _Py_InitializeCore() should always set Py_HashRandomizationFlag.

    +1

    At the same time, the use of Py_*Flag variables is a little unclear in the PEP-432 world. Under the PEP I'd expect code in CPython to use the core config from the current PyInterpreterState (or even _PyRuntimeState), rather than the global flag variables. So presumably (the PEP doesn't talk about it) the intent of keeping them is to help folks that currently make use of the flags. However, they shouldn't be modifying them after initialization, and under PEP-432 they don't need to set them beforehand (they set values on the config), right?

    Part of the problem here is that Py_Main() results in all the flag variables getting set (before runtime initialization). However, with the current PEP-432 [internal] implementation those variables are (mostly) never getting set, right? Shouldn't we set all of them, but only at the end of _Py_InitializeCore() (for the sake of extensions/embedders that use them)? In that case we'd need to replace our internal use of Py_HashRandomizationFlag with field on the config struct on _PyRuntimeState.

    But it was just one example to show that right now, _Py_InitializeCore() doesn't seem to support to be called twice with two different configurations.

    Are there other things (than how we set that flag) that break in that case? Regardless, what would ensure that it works properly?

    ncoghlan commented 6 years ago

    Victor, could you file a separate issue for Py_Main(). It's a different problem, since Py_Main() is an entry point for an *application that embeds Python*. It's not a CPython API in the tradition sense.

    ncoghlan commented 6 years ago

    I filed https://bugs.python.org/issue34008 to cover the fact that "Py_Initialize(); Py_Main();" doesn't work in 3.7.0 whereas it used to sort of work (by only partially applying the settings read from the CLI and environment by the Py_Main() call).

    Since "Py_Initialize(); Py_Initialize();" *does* work in 3.7.0, closing this one again.

    ncoghlan commented 6 years ago

    Answering Eric's question about the flag variables though:

    One of the first things that Py_Main() does in 3.7 is to:

    This mostly fakes the 3.6-and-earlier behaviour of using the global variables directly.

    For anything listed in https://docs.python.org/3/c-api/init.html#global-configuration-variables, we also mostly left any code reading the global variable directly alone, since they're considered a public API for embedding applications to dynamically adjust behaviour and we don't have a good way of deprecating using them for that purpose :(