python / cpython

The Python programming language
https://www.python.org
Other
63.37k stars 30.35k forks source link

Do we support calling Py_Main() after Py_Initialize()? #78189

Closed ncoghlan closed 6 years ago

ncoghlan commented 6 years ago
BPO 34008
Nosy @ncoghlan, @abalkin, @vstinner, @mcepl, @ericsnowcurrently, @emilyemorehouse, @hroncok
PRs
  • python/cpython#8023
  • python/cpython#8043
  • python/cpython#8352
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['3.8', 'type-bug', '3.7'] title = 'Do we support calling Py_Main() after Py_Initialize()?' updated_at = user = 'https://github.com/ncoghlan' ``` bugs.python.org fields: ```python activity = actor = 'ncoghlan' assignee = 'none' closed = True closed_date = closer = 'ncoghlan' components = [] creation = creator = 'ncoghlan' dependencies = [] files = [] hgrepos = [] issue_num = 34008 keywords = ['patch'] message_count = 18.0 messages = ['320758', '320760', '320768', '320771', '320772', '320773', '320857', '321036', '321384', '321565', '322006', '322007', '322011', '322017', '322019', '322041', '322044', '322302'] nosy_count = 7.0 nosy_names = ['ncoghlan', 'belopolsky', 'vstinner', 'mcepl', 'eric.snow', 'emilyemorehouse', 'hroncok'] pr_nums = ['8023', '8043', '8352'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue34008' versions = ['Python 3.7', 'Python 3.8'] ```

    ncoghlan commented 6 years ago

    In the current documentation, Py_Main is listed as a regular C API function: https://docs.python.org/3/c-api/veryhigh.html#c.Py_Main

    We also didn't add Py_Main() to https://docs.python.org/3/c-api/init.html#before-python-initialization when we did the review for "Functions which are safe to call before Py_Initialize()".

    If it truly is a regular C API function, then this suggests that Py_Main() should be called *after* Py_Initialize().

    However, Py_Main() has historically called Py_Initialize() *itself*, and generally assumes it has total control over management of the current process - it doesn't expect to be sharing the current process with an embedding application.

    In Python 3.7, Py_Main() was changed to call _Py_InitializeCore() and _Py_InitializeMainInterpreter() as separate steps, allowing it to detect cases where an embedding application had already called Py_Initialize(), meaning that Py_Main()'s config setting changes wouldn't be applied.

    In effect, we've converted a silent failure (settings not read and applied as expected) into a noisy one (attempting to call Py_Main() in an already initialised interpreter).

    (The test suite is silent on the matter, since the idea of an embedding application calling Py_Initialize() before calling Py_Main() never even occurred to us)

    So the question is, what should we do about this for Python 3.7:

    1. Treat it as a regression, try to replicate the old behaviour of partially (but not completely) ignoring the config settings read by Py_Main()
    2. Treat it as a regression, but don't try to replicate the old config-settings-are-partially-applied behaviour - just ignore the settings read by Py_Main() completely
    3. Treat it as a long standing documentation error, update the docs to make it clear that Py_Main() expects to handle calling Py_Initialize() itself, and update the Python 3.7 porting guide accordingly
    bc75918c-a209-4fa3-b6cf-28cfb7317f76 commented 6 years ago

    This hits fontforge. See https://bugzilla.redhat.com/show_bug.cgi?id=1595421

    An example of how it should be handled there if 3) is selected would be greatly appreciated. Thanks

    vstinner commented 6 years ago

    Even if it's ugly, calling Py_Main() after Py_Initialize() works in Python 3.6 by ignoring the new config of Py_Main(). I suggest to do nothing (just return) when Py_Initialize() is called again by Py_Main() in Python 3.7. It would fix the fontforge *regression*.

    I suggest to only enhance the code (really apply the new Py_Main() config) in the master branch. From what I saw, apply the new config is not trivial. It will take time to decide what to do for *each* option.

    ncoghlan commented 6 years ago

    Yeah, I've asked Miro to try essentially that in https://bugzilla.redhat.com/show_bug.cgi?id=1595421#c12

    My concern is that Py_Main used to keep a *lot* of state on the local C stack that we've now moved into the main interpreter config.

    So my suspicion is that even that change I've asked Miro to try may not do the right thing, but it really depends on *why* fontforge is calling Py_Main, and what they're passing in argc and argv.

    Given that the embedded Py_Initialize is a no-op, they may be better off just copying out the parts they actually need from the code execution aspects of it, and skipping the Py_Main call entirely.

    ncoghlan commented 6 years ago

    At the very least, I think this calls for a documentation change, such that we make it clear that:

    1. The expected calling pattern is to call Py_Main() *instead of* Py_Initialize(), since Py_Main() calls Py_Initialize().

    2. If you do call Py_Initialize() first, then exactly which settings that Py_Main() will read from the environment and apply is version dependent.

    So I'll put together a docs patch that's valid for 3.6+, and then we can decide where to take 3.7 and 3.8 from an implementation perspective after that.

    Currently, I'm thinking that for 3.7, our goal should specifically be "fake the 3.6 behaviour well enough to get fontforge working again" (so it will mostly be Fedora that drives what that 3.7.1 change looks like), while for 3.8 we should aim to deprecate that code path, and instead offer a better building block API for folks that want to implement their own Py_Main() variant.

    ncoghlan commented 6 years ago

    Removing 3.6 from the affected versions, since we re-arranged these docs a bit for 3.7 to make it clearer which functions could be called prior to initialization, as well as adding more tests for the actual pre-init functionality.

    Since 3.6 isn't going to change, and we'll be aiming to get back to something a bit closer to the 3.6 behaviour for 3.7.1, it seems simpler to just focus on 3.7 and 3.8.

    vstinner commented 6 years ago

    My PR 8043 fix the Python 3.7 regression: allow again to call Py_Main() after Py_Initialize().

    Nick: if you want to raise an error on that case, I would prefer to see a deprecation warning emitted in version N before raising an error in version N+1.

    Technically, it seems possible to allow calling Py_Main() after Py_Initialize(). It's "just" a matter of fixing Py_Main() to apply properly the new configuration.

    ncoghlan commented 6 years ago

    Aye, and I think from Miro's comments on your PR, making it apply some of the same configuration settings that 3.6 and earlier applied is going to be required to get fontforge working again for 3.7.

    At the very least, the call to set sys.argv is needed.

    ncoghlan commented 6 years ago

    Thinking about this some more, I'm inclined to go the same way we did with bpo-33932: classify it as an outright regression, work out the desired requirements for a missing embedding test case, and then fix the implementation to pass the new test case.

    My suggestion for test commands would be:

    ../Lib/site.py
    -m site
    -c 'import sys; print(sys.argv)' some test data here

    Once those work properly, we'd consider the regression relative to Python 3.6 fixed.

    Those could either be 3 different test cases, or else we could run them all within a single test case, with a Py_Initialize() call before each one (since Py_Main() calls Py_Finalize() internally).

    vstinner commented 6 years ago

    I tested fontforge. It became clear to me that the fontforge must be able to call Py_Initialize() and only later call Py_Main(). fontforge calls Py_Initialize() and then uses the Python API:

    /* This is called to start up the embedded python interpreter */
    void FontForge_InitializeEmbeddedPython(void) {
        // static int python_initialized is declared above.
        if ( python_initialized )
        return;
    
        SetPythonProgramName("fontforge");
        RegisterAllPyModules();
        Py_Initialize();
        python_initialized = 1;
    /* The embedded python interpreter is now functionally
     * "running". We can modify it to our needs.
     \*/
    CreateAllPyModules();
    InitializePythonMainNamespace();

    }

    Py_Main() is called after FontForge_InitializeEmbeddedPython().

    To be clear, Py_Main() must be fixed in Python 3.7 to apply properly the new configuration, or *at least* define sys.argv.

    vstinner commented 6 years ago

    This hits fontforge. See https://bugzilla.redhat.com/show_bug.cgi?id=1595421

    Copy of my comment 20:

    I looked one more time to the issue:

    vstinner commented 6 years ago

    I looked one more time at Python 3.6, code before my huge Py_Main()/Py_Initialize() refactoring, before _PyCoreConfig/_PyMainInterpreterConfig have been added. In short, calling Py_Main() after Py_Initialize() "works" in Python 3.6 but only sys.argv is set: almost all other options are lost/ignored. For example, if you call Py_Initialize() you get a sys.path from the current environment, but if Py_Main() gets a new module search path: sys.path is not updated.

    I modified PR 8043 to write the minimum patch just to fix fontforge. When Py_Main() is called Py_Initialize(): just set sys.argv, that's all.

    If someone wants to fix this use case, apply properly the new Py_Main() configuration carefully, I would suggest to only work in the master branch: that's out of the scope of this specific regression.

    IMHO it would be nice to enhance this use case. For example, it would be "nice" to update at least "sys.path" (and other related variables) in such case.

    ncoghlan commented 6 years ago

    +1 from me for the idea of fixing Python 3.7 to correctly set sys.argv in this case (and leaving everything else unmodified).

    As far as further enhancement in Python 3.8 goes, perhaps that can be seen as part of PEP-432, such that embedding applications have to explicitly opt-in to any new behaviour by calling the new APIs?

    vstinner commented 6 years ago

    I created bpo-34170: "Py_Initialize(): computing path configuration must not have side effect (PEP-432)" as a follow-up of this issue.

    vstinner commented 6 years ago

    New changeset fb47bca9ee2d07ce96df94b4e4abafd11826eb01 by Victor Stinner in branch 'master': bpo-34008: Allow to call Py_Main() after Py_Initialize() (GH-8043) https://github.com/python/cpython/commit/fb47bca9ee2d07ce96df94b4e4abafd11826eb01

    vstinner commented 6 years ago

    New changeset 03ec4df67d6b4ce93a2da21db7c84dff8259953f by Victor Stinner (Miss Islington (bot)) in branch '3.7': bpo-34008: Allow to call Py_Main() after Py_Initialize() (GH-8043) (GH-8352) https://github.com/python/cpython/commit/03ec4df67d6b4ce93a2da21db7c84dff8259953f

    vstinner commented 6 years ago

    Ok, I fixed the fontforge bug in 3.7 and master branches. Sorry for the regression, but I really didn't expect that anyone would call Py_Main() after Py_Initialize() :-) I guess we should be prepared for such hiccup when reworking the Python initialization :-)

    Nick: Can I close the issue, or do you want to keep it open to change the documentation?

    ncoghlan commented 6 years ago

    Since we decided the correct resolution here was to restore the Python 3.6 behaviour, I've filed https://bugs.python.org/issue34206 as a separate docs clarification issue (I'll amend my PR accordingly)