python / cpython

The Python programming language
https://www.python.org
Other
63.36k stars 30.35k forks source link

update the import machinery to only use __spec__ #65961

Open ericsnowcurrently opened 10 years ago

ericsnowcurrently commented 10 years ago
BPO 21762
Nosy @warsaw, @brettcannon, @ncoghlan, @ericsnowcurrently, @nanjekyejoannah
Dependencies
  • bpo-25791: Raise an ImportWarning when spec.parent/package isn't defined for a relative import
  • bpo-42132: Use specs instead of just loader in C code
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/brettcannon' closed_at = None created_at = labels = ['interpreter-core', 'type-feature'] title = 'update the import machinery to only use __spec__' updated_at = user = 'https://github.com/ericsnowcurrently' ``` bugs.python.org fields: ```python activity = actor = 'brett.cannon' assignee = 'brett.cannon' closed = False closed_date = None closer = None components = ['Interpreter Core'] creation = creator = 'eric.snow' dependencies = ['25791', '42132'] files = [] hgrepos = [] issue_num = 21762 keywords = [] message_count = 13.0 messages = ['220581', '220597', '220606', '220610', '258332', '258357', '258398', '258440', '258443', '258844', '258845', '379490', '412537'] nosy_count = 5.0 nosy_names = ['barry', 'brett.cannon', 'ncoghlan', 'eric.snow', 'nanjekyejoannah'] pr_nums = [] priority = 'low' resolution = None stage = 'needs patch' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue21762' versions = ['Python 3.6'] ```

    Linked PRs

    ericsnowcurrently commented 10 years ago

    With PEP-451, Python 3.4 introduced module specs to encapsulate the module's import-related information, particularly for loading. While __loader, __file, and __cached are no longer used by the import machinery, in a few places it still uses __name, __package, and __path.

    Typically the spec and the module attrs will have the same values, so it would be a non-issue. However, the import-related module attributes are not read-only and the consequences of changing them (i.e. accidentally or to rely on an implementation detail) are not clearly defined. Making the spec strictly authoritative reduces the likelihood accidental changes and gives a better focus point for a module's import behavior (which was kind of the point of PEP-451 in the first place). Furthermore, objects in sys.modules are not required to be modules. By relying strictly on __spec__ we likewise give a more distinct target (of import-related info) for folks that need to use that trick.

    I don't recall the specifics on why we didn't change those 3 attributes for PEP-451 (unintentional or for backward compatibility?). At one point we discussed the idea that a module's spec contains the values that *were* used to load the module. Instead, each spec became the image of how the import system sees and treats the module.

    So unless there's some solid reason, I'd like to see the use of __name, __package, and __path__ by the import machinery eliminated (and accommodated separately if appropriate). Consistent use of specs in the import machinery will help limit future surprises.

    Here are the specific places:

    __name__ --------

    mod.__repr__()
    ExtensionFileLoader.load_module()
    importlib._bootstrap._handle_fromlist()
    importlib._bootstrap._calc___package__()
    importlib._bootstrap.__import__()

    __package__ -----------

    importlib._bootstrap._calc___package__()

    __path__ --------

    importlib._bootstrap._find_and_load_unlocked()
    importlib._bootstrap._handle_fromlist()
    importlib._bootstrap._calc___package__()

    __file__ --------

    mod.__repr__()

    Note that I'm not suggesting the module attributes be eliminated (they are useful for informational reasons). I would just like the import system to stop using them. I suppose they could be turned into read-only properties, but anything like that should be addressed in a separate issue.

    If we do make this change, the language reference, importlib docs, and inspect docs should be updated to clearly reflect the role of the module attributes in the import system.

    I have not taken into account the impact on the standard library. However, I expect that it will be minimal at best. (See issue bpo-21736 for a related discussion).

    ncoghlan commented 10 years ago

    Manipulating name, package and path at runtime is fully supported, and the module level attributes accordingly take precedence over the initial import time spec.

    There may be some test suite gaps and documentation issues around the behaviour, but it's definitely intentional (things like runpy, "pseudo-modules", third party namespace package support and workarounds for running modules inside packages correctly rely on it).

    ericsnowcurrently commented 10 years ago

    Thanks for clarifying. I remembered discussing it but couldn't recall the details. Documenting the exact semantics, use cases, and difference between spec and module attrs would be help. I'll look into updating the language reference when I have some time.

    It would still be worth it to find a way to make __spec__ fully authoritative, but I'll have to come up with a solution to the current use cases before that could go anywhere. :)

    ncoghlan commented 10 years ago

    The spec is authoritative for "how was this imported?". The differences between that and the module attributes then provide a record of any post-import modifications.

    brettcannon commented 8 years ago

    So I am going to disagree with Nick about the module attributes and their usefulness (partially because I just made __spec.parent take precedence over __package in issue bpo-25791). While I get the idea of wanting a history of what did (not) change since import, keeping the duplicate information around is annoying. And I don't know how truly useful it is to know what things were compared to what they became.

    If we shift to preferring specs compared to module attributes we can then begin to clean up __import itself by no longer grabbing the globals() and locals() and instead simply pass in the module's __spec object. It also simplifies the documentation such that we don't have to explain everything twice. If people really want to track what a value was relating to import before mutation they can simply store it themselves instead of making us do the bookkeeping for them. It would also make things such as module_from_spec() or loader.createmodule() simpler since they only have to worry about setting \_spec__ instead of that attribute plus a bunch of other ones.

    ncoghlan commented 8 years ago

    My concern is more about backwards compatibility - at the moment, you can alter the behaviour of import, pickle, and other subsystems by modifying the module level attributes, and if we switch to preferring the __spec attributes, then that kind of code will break (I added an import specific example related to __main module relative imports to the linked issue).

    That's not to say it shouldn't be done - as you say, it would be nice to eventually get to a point where the import system only needs access to the module spec and not to the runtime state, and there are also cases where the __spec information will be more correct (e.g. pickling objects in __main).

    However, it needs to be in such a way that there are appropriate porting notes that explain to people why their state mutations stopped having the desired effect, and what (if anything) they can do instead.

    brettcannon commented 8 years ago

    I totally agree proper notes in the What's New doc need to be in place to explain that people need to update.

    How about I tweak the __package change to continue to prefer __package over __spec.parent, but raise an ImportWarning when they differ? It can also fall back to __spec.parent if __package isn't defined and simply not worry about the lack of __package? Then we can do an ImportWarning for all of the other attributes when we discover a difference so people have time to migrate to updating both locations, and then eventually we can invert the priority and then after that drop the module attributes.

    ncoghlan commented 8 years ago

    That approach sounds good to me.

    The main problem cases I'm aware of are:

    __name__:

    __path__:

    __package__:

    brettcannon commented 8 years ago

    Yeah, which is why it will take a transition to get people to start mucking with spec instead of the module attributes for their legitimate/questionable needs (although the whole __name__ == '__main__' idiom means name will never go away while path, package, loader, file, and cached could all slowly go away).

    brettcannon commented 8 years ago

    __package != __spec.parent now triggers an ImportWarning.

    brettcannon commented 8 years ago

    I think that leaves the following attributes to be updated/checked for dependencies in importlib (and if they are found, raise ImportWarning when they differ):

    1. __path__
    2. __loader__
    3. __file__
    4. __cached__
    brettcannon commented 4 years ago

    It turns out that the import system itself doesn't use __loader__ (it does set it), but various parts of the stdlib do use __loader__. bpo-42133 updates a bunch of stdlib modules to use __spec__.loader as a fallback. bpo-42132 tracks the fact that there's C code which isn't setting or using __spec__ (let alone __spec__.loader).

    brettcannon commented 2 years ago

    Plan (as discussed with @warsaw ):

    1. __package__
      • [x] Make sure all uses of the attribute fall back on __spec__ (done way back when)
      • [x] Add an ImportWarning when the attribute is used but it differs from __spec__ (3.6)
      • [x] Update code to prefer the spec over the attribute, raising ImportWarning when having to fall back to the attribute (3.10)
      • [x] Change ImportWarning to DeprecationWarning when falling back to the attribute (3.12; GH-97879)
      • [x] Document the deprecation
      • [ ] Remove code in importlib that set and used the old attribute (3.15)
    2. __loader__
      • [x] Make sure all Python code uses of the attribute fall back on __spec__ (3.10)
      • [x] Update C code to fall back to using __spec__ (issue)
      • [ ] Add a DeprecationWarning when __loader__ != __spec__.loader or __spec__.loader is not set.
      • [ ] Remove code in importlib that set and used the old attribute (3.15)
    3. __cached__
      • [x] Make sure all uses of the attribute fall back on __spec__ (PR
      • [x] ~Add an DeprecationWarning when the attribute is used but it differs from __spec__ (including not being set on __spec__) or __spec__.cached.
      • [x] Document the deprecation
      • [ ] Remove code in importlib that used the old attribute (3.14)
    warsaw commented 2 years ago

    Update C code to fall back to using spec

    97803 is the PR and #86298 is the ticket. Only _warnings.c needs to be changed.

    ncoghlan commented 1 month ago

    Another note for each section in @brettcannon's TODO list above:

    It's the same globals update call for all three of the old attributes: https://github.com/python/cpython/blob/main/Lib/runpy.py#L81

    The pkg_name parameters on the _run_code and _run_module_code helper functions in runpy will also become redundant (since they're only used to set __package__) and should be removed.

    That change will also eliminate a historical slight behavioural difference between direct script execution and runpy.run_path in the way they set __package__ to indicate a top-level module (default is None, runpy.run_path sets the empty string):

    $ echo "print(__package__)" > package.py
    $ python3 package.py
    None
    $ python3 -c "import runpy; runpy.run_path('package.py')"
    
    $

    (by contrast, there's no discrepancy for __spec__: they both set it to None for direct script execution)