Open ericsnowcurrently opened 10 years ago
With PEP-451, Python 3.4 introduced module specs to encapsulate the module's import-related information, particularly for loading. While __loader, __file, and __cached are no longer used by the import machinery, in a few places it still uses __name, __package, and __path.
Typically the spec and the module attrs will have the same values, so it would be a non-issue. However, the import-related module attributes are not read-only and the consequences of changing them (i.e. accidentally or to rely on an implementation detail) are not clearly defined. Making the spec strictly authoritative reduces the likelihood accidental changes and gives a better focus point for a module's import behavior (which was kind of the point of PEP-451 in the first place). Furthermore, objects in sys.modules are not required to be modules. By relying strictly on __spec__ we likewise give a more distinct target (of import-related info) for folks that need to use that trick.
I don't recall the specifics on why we didn't change those 3 attributes for PEP-451 (unintentional or for backward compatibility?). At one point we discussed the idea that a module's spec contains the values that *were* used to load the module. Instead, each spec became the image of how the import system sees and treats the module.
So unless there's some solid reason, I'd like to see the use of __name, __package, and __path__ by the import machinery eliminated (and accommodated separately if appropriate). Consistent use of specs in the import machinery will help limit future surprises.
Here are the specific places:
__name__ --------
mod.__repr__()
ExtensionFileLoader.load_module()
importlib._bootstrap._handle_fromlist()
importlib._bootstrap._calc___package__()
importlib._bootstrap.__import__()
__package__ -----------
importlib._bootstrap._calc___package__()
__path__ --------
importlib._bootstrap._find_and_load_unlocked()
importlib._bootstrap._handle_fromlist()
importlib._bootstrap._calc___package__()
__file__ --------
mod.__repr__()
Note that I'm not suggesting the module attributes be eliminated (they are useful for informational reasons). I would just like the import system to stop using them. I suppose they could be turned into read-only properties, but anything like that should be addressed in a separate issue.
If we do make this change, the language reference, importlib docs, and inspect docs should be updated to clearly reflect the role of the module attributes in the import system.
I have not taken into account the impact on the standard library. However, I expect that it will be minimal at best. (See issue bpo-21736 for a related discussion).
Manipulating name, package and path at runtime is fully supported, and the module level attributes accordingly take precedence over the initial import time spec.
There may be some test suite gaps and documentation issues around the behaviour, but it's definitely intentional (things like runpy, "pseudo-modules", third party namespace package support and workarounds for running modules inside packages correctly rely on it).
Thanks for clarifying. I remembered discussing it but couldn't recall the details. Documenting the exact semantics, use cases, and difference between spec and module attrs would be help. I'll look into updating the language reference when I have some time.
It would still be worth it to find a way to make __spec__ fully authoritative, but I'll have to come up with a solution to the current use cases before that could go anywhere. :)
The spec is authoritative for "how was this imported?". The differences between that and the module attributes then provide a record of any post-import modifications.
So I am going to disagree with Nick about the module attributes and their usefulness (partially because I just made __spec.parent take precedence over __package in issue bpo-25791). While I get the idea of wanting a history of what did (not) change since import, keeping the duplicate information around is annoying. And I don't know how truly useful it is to know what things were compared to what they became.
If we shift to preferring specs compared to module attributes we can then begin to clean up __import itself by no longer grabbing the globals() and locals() and instead simply pass in the module's __spec object. It also simplifies the documentation such that we don't have to explain everything twice. If people really want to track what a value was relating to import before mutation they can simply store it themselves instead of making us do the bookkeeping for them. It would also make things such as module_from_spec() or loader.createmodule() simpler since they only have to worry about setting \_spec__ instead of that attribute plus a bunch of other ones.
My concern is more about backwards compatibility - at the moment, you can alter the behaviour of import, pickle, and other subsystems by modifying the module level attributes, and if we switch to preferring the __spec attributes, then that kind of code will break (I added an import specific example related to __main module relative imports to the linked issue).
That's not to say it shouldn't be done - as you say, it would be nice to eventually get to a point where the import system only needs access to the module spec and not to the runtime state, and there are also cases where the __spec information will be more correct (e.g. pickling objects in __main).
However, it needs to be in such a way that there are appropriate porting notes that explain to people why their state mutations stopped having the desired effect, and what (if anything) they can do instead.
I totally agree proper notes in the What's New doc need to be in place to explain that people need to update.
How about I tweak the __package change to continue to prefer __package over __spec.parent, but raise an ImportWarning when they differ? It can also fall back to __spec.parent if __package isn't defined and simply not worry about the lack of __package? Then we can do an ImportWarning for all of the other attributes when we discover a difference so people have time to migrate to updating both locations, and then eventually we can invert the priority and then after that drop the module attributes.
That approach sounds good to me.
The main problem cases I'm aware of are:
__name__:
__path__:
__package__:
Yeah, which is why it will take a transition to get people to start mucking with spec instead of the module attributes for their legitimate/questionable needs (although the whole __name__ == '__main__'
idiom means name will never go away while path, package, loader, file, and cached could all slowly go away).
__package != __spec.parent now triggers an ImportWarning.
I think that leaves the following attributes to be updated/checked for dependencies in importlib (and if they are found, raise ImportWarning when they differ):
It turns out that the import system itself doesn't use __loader__
(it does set it), but various parts of the stdlib do use __loader__
. bpo-42133 updates a bunch of stdlib modules to use __spec__.loader
as a fallback. bpo-42132 tracks the fact that there's C code which isn't setting or using __spec__
(let alone __spec__.loader
).
Plan (as discussed with @warsaw ):
__package__
__spec__
(done way back when)ImportWarning
when the attribute is used but it differs from __spec__
(3.6)ImportWarning
when having to fall back to the attribute (3.10)ImportWarning
to DeprecationWarning
when falling back to the attribute (3.12; GH-97879)importlib
that set and used the old attribute (3.15)__loader__
__spec__
(3.10)__spec__
(issue)DeprecationWarning
when __loader__ != __spec__.loader
or __spec__.loader
is not set.importlib
that set and used the old attribute (3.15)__cached__
__spec__
(PRDeprecationWarning
when the attribute is used but it differs from __spec__
(including not being set on __spec__
) or __spec__.cached
.importlib
that used the old attribute (3.14)Update C code to fall back to using spec
_warnings.c
needs to be changed.Another note for each section in @brettcannon's TODO list above:
runpy
that sets the old module attributeIt's the same globals
update call for all three of the old attributes: https://github.com/python/cpython/blob/main/Lib/runpy.py#L81
The pkg_name
parameters on the _run_code
and _run_module_code
helper functions in runpy
will also become redundant (since they're only used to set __package__
) and should be removed.
That change will also eliminate a historical slight behavioural difference between direct script execution and runpy.run_path
in the way they set __package__
to indicate a top-level module (default is None
, runpy.run_path
sets the empty string):
$ echo "print(__package__)" > package.py
$ python3 package.py
None
$ python3 -c "import runpy; runpy.run_path('package.py')"
$
(by contrast, there's no discrepancy for __spec__
: they both set it to None
for direct script execution)
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/brettcannon' closed_at = None created_at =
labels = ['interpreter-core', 'type-feature']
title = 'update the import machinery to only use __spec__'
updated_at =
user = 'https://github.com/ericsnowcurrently'
```
bugs.python.org fields:
```python
activity =
actor = 'brett.cannon'
assignee = 'brett.cannon'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation =
creator = 'eric.snow'
dependencies = ['25791', '42132']
files = []
hgrepos = []
issue_num = 21762
keywords = []
message_count = 13.0
messages = ['220581', '220597', '220606', '220610', '258332', '258357', '258398', '258440', '258443', '258844', '258845', '379490', '412537']
nosy_count = 5.0
nosy_names = ['barry', 'brett.cannon', 'ncoghlan', 'eric.snow', 'nanjekyejoannah']
pr_nums = []
priority = 'low'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue21762'
versions = ['Python 3.6']
```
Linked PRs