python / cpython

The Python programming language
https://www.python.org/
Other
61.15k stars 29.52k forks source link

Freeze all modules imported during startup. #89183

Closed ericsnowcurrently closed 2 years ago

ericsnowcurrently commented 2 years ago
BPO 45020
Nosy @malemburg, @gvanrossum, @warsaw, @brettcannon, @nascheme, @rhettinger, @terryjreedy, @gpshead, @ronaldoussoren, @ncoghlan, @vstinner, @larryhastings, @tiran, @tiran, @tiran, @methane, @markshannon, @ericsnowcurrently, @indygreg, @lysnikolaou, @pablogsal, @miss-islington, @brandtbucher, @isidentical, @shihai1991, @FFY00, @softsol solutions
PRs
  • python/cpython#28107
  • python/cpython#28320
  • python/cpython#28335
  • python/cpython#28344
  • python/cpython#28345
  • python/cpython#28346
  • python/cpython#28375
  • python/cpython#28380
  • python/cpython#28392
  • python/cpython#28398
  • python/cpython#28410
  • python/cpython#28538
  • python/cpython#28554
  • python/cpython#28583
  • python/cpython#28590
  • python/cpython#28635
  • python/cpython#28664
  • python/cpython#28665
  • python/cpython#28655
  • python/cpython#28940
  • python/cpython#28997
  • python/cpython#29755
  • python/cpython#29755
  • python/cpython#29755
  • Dependencies
  • bpo-45186: Marshal output isn't completely deterministic.
  • bpo-45188: De-couple the Windows builds from freezing modules.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/ericsnowcurrently' closed_at = created_at = labels = ['interpreter-core', 'type-feature', '3.11'] title = 'Freeze all modules imported during startup.' updated_at = user = 'https://github.com/ericsnowcurrently' ``` bugs.python.org fields: ```python activity = actor = 'christian.heimes' assignee = 'eric.snow' closed = True closed_date = closer = 'eric.snow' components = ['Parser'] creation = creator = 'eric.snow' dependencies = ['45186', '45188'] files = [] hgrepos = [] issue_num = 45020 keywords = ['patch'] message_count = 96.0 messages = ['400370', '400371', '400372', '400373', '400374', '400381', '400386', '400394', '400421', '400422', '400427', '400447', '400449', '400450', '400454', '400455', '400459', '400460', '400461', '400469', '400507', '400508', '400510', '400629', '400636', '400638', '400639', '400660', '400664', '400665', '400666', '400667', '400675', '400766', '400769', '400808', '400855', '400856', '401027', '401040', '401734', '401740', '401805', '401811', '401813', '401814', '401848', '401865', '401870', '401905', '401911', '401917', '401918', '401919', '401921', '401954', '401966', '401988', '402017', '402020', '402070', '402080', '402100', '402101', '402103', '402113', '402116', '402118', '402119', '402151', '402356', '402463', '402464', '402476', '402587', '402609', '402629', '402633', '402634', '402898', '402993', '403024', '403025', '403255', '403256', '403323', '403324', '404111', '404164', '404257', '404344', '405002', '405203', '405244', '406143', '406948'] nosy_count = 25.0 nosy_names = ['lemburg', 'gvanrossum', 'barry', 'brett.cannon', 'nascheme', 'rhettinger', 'terry.reedy', 'gregory.p.smith', 'ronaldoussoren', 'ncoghlan', 'vstinner', 'larry', 'christian.heimes', 'christian.heimes', 'christian.heimes', 'methane', 'Mark.Shannon', 'eric.snow', 'indygreg', 'lys.nikolaou', 'pablogsal', 'miss-islington', 'brandtbucher', 'BTaskaya', 'shihai1991', 'FFY00', 'santhu_reddy12'] pr_nums = ['28107', '28320', '28335', '28344', '28345', '28346', '28375', '28380', '28392', '28398', '28410', '28538', '28554', '28583', '28590', '28635', '28664', '28665', '28655', '28940', '28997', '29755', '29755', '29755'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue45020' versions = ['Python 3.11'] ```

    ericsnowcurrently commented 2 years ago

    New changeset a9757bf34d8b4cb3c24bbb70d50a06c815e2e8f3 by Eric Snow in branch 'main': bpo-45020: Drop the frozen .h files from the repo. (gh-28375) https://github.com/python/cpython/commit/a9757bf34d8b4cb3c24bbb70d50a06c815e2e8f3

    ericsnowcurrently commented 2 years ago

    Looks like that last commit broke one of the buildbots:

    https://buildbot.python.org/all/#/builders/483/builds/812

    I'll fix that right away.

    gvanrossum commented 2 years ago

    FWIW, Python/frozen_modules/hello.h is still in the repo somehow.

    ericsnowcurrently commented 2 years ago

    Never mind. It's pretty late here so I'm going to revert it and sort it out in the morning.

    ericsnowcurrently commented 2 years ago

    New changeset 9fd87a5fe5c468cf94265365091267838b004b7f by Eric Snow in branch 'main': bpo-45020: Revert "Drop the frozen .h files from the repo." (gh-28380) https://github.com/python/cpython/commit/9fd87a5fe5c468cf94265365091267838b004b7f

    vstinner commented 2 years ago

    Is bpo-45220 "Windows builds sometimes fail on Azure and GitHub Action: fatal error RC1116: RC terminating after preprocessor errors" related to this issue?

    ericsnowcurrently commented 2 years ago

    I left a comment on bpo-45220.

    ericsnowcurrently commented 2 years ago

    New changeset fdc6b3d9316501d2f0068a1bf4334debc1949e62 by Eric Snow in branch 'main': bpo-45020: Drop the frozen .h files from the repo. (gh-28392) https://github.com/python/cpython/commit/fdc6b3d9316501d2f0068a1bf4334debc1949e62

    vstinner commented 2 years ago

    It's no longer possible to build Python from a different directory. Example:

    git clean -fdx mkdir build cd build/ ../configure --with-pydebug make

    It fails with:

    ../Programs/_freeze_module importlib._bootstrap ../Lib/importlib/_bootstrap.py ../Python/frozen_modules/importlib__bootstrap.h make: ../Programs/_freeze_module: No such file or directory

    I'm working on a fix.

    vstinner commented 2 years ago

    There is also a minor issue, "make regen-frozen" changes the end of line of two PCbuild/ files: see bpo-45231.

    vstinner commented 2 years ago

    New changeset 41551ee7e24fb6c58846836d3655dbb212281206 by Victor Stinner in branch 'main': bpo-45020: Fix build out of source tree (GH-28410) https://github.com/python/cpython/commit/41551ee7e24fb6c58846836d3655dbb212281206

    ericsnowcurrently commented 2 years ago

    Thanks for fixing that, Victor!

    gvanrossum commented 2 years ago

    (Ah, now I recall how the $(srcdir) mechanism works again. Generated files do *not* have a $(srcdir) prefix, since they are generated in the destination directory. Hence the fix for the buildbot failure a few days ago. I take back my skepticism about that.)

    gvanrossum commented 2 years ago

    BTW, why does the script still run Programs/_freeze_module over all the modules to be frozen? Isn't that up to the Makefile or its Windows equivalent?

    ericsnowcurrently commented 2 years ago

    New changeset 090591636c4f03ce06a039079bd7716a5b23631e by Eric Snow in branch 'main': bpo-45020: Freeze os, site, and codecs. (gh-28398) https://github.com/python/cpython/commit/090591636c4f03ce06a039079bd7716a5b23631e

    ericsnowcurrently commented 2 years ago

    On Fri, Sep 17, 2021 at 4:22 PM Guido van Rossum \report@bugs.python.org\ wrote:

    BTW, why does the script still run Programs/_freeze_module over all the modules to be frozen? Isn't that up to the Makefile or its Windows equivalent?

    Yeah, it used to matter but we probably don't need to do that any longer.

    rhettinger commented 2 years ago

    It would be nice to freeze argparse.py and its dependencies. For command-line tools, startup time is important.

    gvanrossum commented 2 years ago

    It would be nice to freeze argparse.py and its dependencies. For command-line tools, startup time is important.

    I quickly checked, and argparse has at least these dependencies:

    argparse re enum types operator functools collections keyword reprlib sre_compile sre_parse sre_constants copyreg gettext

    Raymond, do you think we should also freeze the dependencies of runpy (so "python -m \<module>" also starts faster)?

    That would be

    runpy importlib warnings importlib.machinery importlib.util importlib._abc contextlib collections keyword operator reprlib functools types

    (I didn't dedupe this from the previous list.)

    With all these modules frozen we'd probably run more risk of two things:

    rhettinger commented 2 years ago

    Raymond, do you think we should also freeze the dependencies of runpy (so "python -m \<module>" also starts faster)?

    Yes, please.

    The '-m' load option can be considered core startup functionality. It is often the recommended way to launch command line tools outside of a virtual environment.

    python -m pip install some_package

    vstinner commented 2 years ago

    runpy startup time matters a lot for "python3 -m module" startup time!

    In Python 3.10 (bpo-41006 and bpo-41718), I reduced the number of modules imported by runpy:

    "The runpy module now imports fewer modules. The python3 -m module-name command startup time is 1.4x faster in average. On Linux, python3 -I -m module-name imports 69 modules on Python 3.9, whereas it only imports 51 modules (-18) on Python 3.10. (Contributed by Victor Stinner in bpo-41006 and bpo-41718.)"

    https://docs.python.org/dev/whatsnew/3.10.html#optimizations

    --

    For argparse, maybe it could use a few lazy imports to reduce the number of indirect impots on "import argparse"?

    gvanrossum commented 2 years ago

    I'm still torn about the need for a __file attribute. Assuming it's more likely that people write code that *reads stdlib source for some reason (maybe printing more context for error messages) than *writing it (which would be truly strange), the amount of 3rd party code that might break due to the lack of __file can be expected to be larger than the amount of code (or number of users) that would be confused to having a __file__ pointing to a file that isn't actually read by the interpreter.

    Okay, so I'm no longer torn. We should set __file__ for frozen modules based on the pre-computed location of the stdlib.

    brettcannon commented 2 years ago

    What about if there isn't a pre-computed location for __file__? I could imagine a self-contained CPython build where there is no concept of a file location on disk for anything using this.

    malemburg commented 2 years ago

    On 22.09.2021 20:47, Brett Cannon wrote:

    What about if there isn't a pre-computed location for __file__? I could imagine a self-contained CPython build where there is no concept of a file location on disk for anything using this.

    This does work and is enough to make most code out there happy.

    I use e.g. "\<pyrun>/os.py" in PyRun. There is no os.py file to load, but tracebacks and inspection tools work just fine with this.

    236914ab-5504-4492-add8-6907a1140f5c commented 2 years ago

    I'll throw out that __file is strictly optional, per https://docs.python.org/3/reference/datamodel.html. IMO it is more important to define importlib.abc.InspectLoader than __file (so tracebacks have source context).

    Given that __file is optional, I'll throw out the idea of shipping modules without __file/cached and see what issues/pushback occurs. I anticipate this will likely result in having to restore advertising __file in frozen modules in 3.11 due to popular demand. But we /could/ use this as an opportunity to start attempting to ween people off __file and onto more appropriate APIs, like importlib.resources. (importlib.resources was introduced in 3.7 and I'm not sure that's old enough to force people onto without relying on a shim like https://pypi.org/project/importlib-resources/.)

    Alternatively, defining a path hook referencing the python executable that knows how to service frozen modules could also work. zipimporter does this (file is \<zip_path>/\<virtual_module_path>) and it is surprisingly not as brittle as one would think. It might be a good middle ground.

    ericsnowcurrently commented 2 years ago

    New changeset 7c801e0fa603b155eab3fd19698aa90854ac5a7b by Eric Snow in branch 'main': bpo-45020: Fix some corner cases for frozen module generation. (gh-28538) https://github.com/python/cpython/commit/7c801e0fa603b155eab3fd19698aa90854ac5a7b

    malemburg commented 2 years ago

    Eric, I noticed that you are freezing os.py. Please be aware that this module is often being used as indicator for where the stdlib was installed (the stdlib itself does this in site.py to read the LICENSE and the test suite also uses os.__file__ in a couple of places).

    It may be worth changing the stdlib to pick a different module as landmark for this purpose.

    Also: Unless you have added the .__file__ attribute to frozen modules, much of this landmark checking code will fail... which is the reason I added the attribute to frozen modules in PyRun.

    gvanrossum commented 2 years ago

    The plan is to add __file__ back, based on where the stdlib lives (we won’t do any stat() calls for frozen files).

    vstinner commented 2 years ago

    Marc-Andre: I suppose that you're talking about LANDMARK in Modules/getpath.c and PC/getpathp.c.

    malemburg commented 2 years ago

    On 25.09.2021 18:20, STINNER Victor wrote:

    STINNER Victor \vstinner@python.org\ added the comment:

    Marc-Andre: I suppose that you're talking about LANDMARK in Modules/getpath.c and PC/getpathp.c.

    Now that you mention it: yes, that as well :-) But os.py is used in the Python stdlib code as well, just search for "os.__file__" to see a few such uses.

    If you search for ".__file__" you'll find that there are quite a few cases in the test suite expecting that attribute on other stdlib modules as well. The attribute may officially be optional, but in reality a lot of code expects to find it.

    ericsnowcurrently commented 2 years ago

    New changeset 45ca1c04139300ec0289a32f78c7ac922a4f7b07 by Eric Snow in branch 'main': bpo-45020: Do not freeze \<pkg>/init.py twice. (gh-28635) https://github.com/python/cpython/commit/45ca1c04139300ec0289a32f78c7ac922a4f7b07

    ericsnowcurrently commented 2 years ago

    New changeset 7e5c107541726b90d3f2e6e69ef37180cf58335d by Eric Snow in branch 'main': bpo-45020: Add more test cases for frozen modules. (gh-28664) https://github.com/python/cpython/commit/7e5c107541726b90d3f2e6e69ef37180cf58335d

    gvanrossum commented 2 years ago

    @santhu_reddy12, why did you assign this to the Parser category? IMO this issue is clearly in the Build category. (We haven't met, I assume you have triage permissions?)

    gvanrossum commented 2 years ago

    And it's most definitely 3.11, not 3.9. (Did you mean to change a different issue?)

    ericsnowcurrently commented 2 years ago

    New changeset 08285d563e64c179a56ab2f952345b3dbcdb54f3 by Eric Snow in branch 'main': bpo-45020: Identify which frozen modules are actually aliases. (gh-28655) https://github.com/python/cpython/commit/08285d563e64c179a56ab2f952345b3dbcdb54f3

    gvanrossum commented 2 years ago

    Whoa. os.path is not always an alias for posixpath, is it?

    larryhastings commented 2 years ago

    Nope. On Windows, os.path is "ntpath".

    ericsnowcurrently commented 2 years ago

    On Tue, Oct 5, 2021 at 11:31 AM Guido van Rossum \report@bugs.python.org\ wrote:

    Whoa. os.path is not always an alias for posixpath, is it?

    Steve brought this to my attention a couple weeks ago. Bottom line: the frozen module entry is only there for checks, not for actual import, but should probably be removed regardless. See https://bugs.python.org/issue45272.

    ericsnowcurrently commented 2 years ago

    New changeset b9cdd0fb9c463c2503a4d854bb6529a9db58fe1b by Eric Snow in branch 'main': bpo-45020: Default to using frozen modules unless running from source tree. (gh-28940) https://github.com/python/cpython/commit/b9cdd0fb9c463c2503a4d854bb6529a9db58fe1b

    gpshead commented 2 years ago

    could changes related to this be the cause of https://bugs.python.org/issue45506 ?

    out of tree builds in main usually cannot pass key tests today. they often hang or blow up with strange exceptions.

    gvanrossum commented 2 years ago

    Is python/cpython#73126 only for UNIX?

    I built on Windows with default options (PCbuild\build.bat) and it looks like the frozen modules are used by default even though I am running in the source directory. (I put a printf() call in unmarshal_frozen_code().)

    I also put a printf() in is_dev_env() and found that it returns 0 on this check:

        /* If dirname() is the same for both then it is a dev build. */
        if (len != _Py_find_basename(stdlib)) {
            return 0;
        }

    I assume that's because the binary (in my case at least) is at PCbuild\amd64\python.exe which is not the same as my current directory (which is the repo root).

    ericsnowcurrently commented 2 years ago

    On Mon, Oct 18, 2021 at 7:14 PM Guido van Rossum \report@bugs.python.org\ wrote:

    Is python/cpython#73126 only for UNIX?

    It wasn't meant to be. :(

    I built on Windows with default options (PCbuild\build.bat) and it looks like the frozen modules are used by default even though I am running in the source directory. (I put a printf() call in unmarshal_frozen_code().)

    I'll look into this.

    ericsnowcurrently commented 2 years ago

    New changeset 6afb285ff0790471a6858e44f85d143f07fda70c by Eric Snow in branch 'main': bpo-45020: Add tests for the -X "frozen_modules" option. (gh-28997) https://github.com/python/cpython/commit/6afb285ff0790471a6858e44f85d143f07fda70c

    ericsnowcurrently commented 2 years ago

    On Mon, Oct 18, 2021 at 7:14 PM Guido van Rossum \report@bugs.python.org\ wrote:

    Is python/cpython#73126 only for UNIX? I built on Windows with default options (PCbuild\build.bat) and it looks like the frozen modules are used by default even though I am running in the source directory. (I put a printf() call in unmarshal_frozen_code().)

    FYI, I opened https://bugs.python.org/issue45651 for sorting this out.

    ericsnowcurrently commented 2 years ago

    I consider this done. There is some lingering follow-up work, for which I've created a number of issues:

    gvanrossum commented 2 years ago

    New changeset 1cbaa505d007e11c4a1f0d2073d72b6c02c7147c by Guido van Rossum in branch 'main': bpo-45696: Deep-freeze selected modules (GH-29118) https://github.com/python/cpython/commit/1cbaa505d007e11c4a1f0d2073d72b6c02c7147c

    tiran commented 2 years ago

    New changeset 5c4b19ec49a5fbad65a682225f7cfed8b78f2a2f by Christian Heimes in branch 'main': bpo-45020: Fix strict-prototypes warning (GH-29755) https://github.com/python/cpython/commit/5c4b19ec49a5fbad65a682225f7cfed8b78f2a2f