Closed ncoghlan closed 10 years ago
In implementing the runpy module updates for PEP-451 I ran into the fact that importlib.find_spec had copied the findloader behaviour where it doesn't handle dotted names automatically - you have to explicitly pass in the \_path__ from the parent module for it to work.
For find_loader, runpy used pkgutil.get_loader instead. The patch on bpo-19700 currently includes a runpy._fixed_find_spec function that handles walking the module name components, loading packages as necessary.
I believe it would be better to move the current thin wrapper around importlib._bootstrap._find_spec to importlib.find_spec_on_path (which will only search for the specified module name directly on the given path without breaking it up into components), with find_spec itself changed to *not* take a "path" argument, and instead always doing the full lookup (as implemented in the bpo-19700 patch _fixed_find_spec function)
Another option would be to add a keyword-only use_parent_path flag to importlib.find_spec() which is True by default. If use_parent is True, 'path' is provided, and there is no parent, then 'path' can act as a fallback.
Which makes me wonder if also adding a keyword-only import_parents keyword to implicitly import any parent packages as necessary would be helpful. It would imply use_parent_path as true. Might also be useful to add to importmodule() to get rid of that annoyance inherited from \_import__.
They're fundamentally different operations though, with one being a single step of the other. I just think "do the full lookup" should have the more obvious name, with the "do a single level lookup" still as a public API, but with the less obvious name.
As a testament to counter-intuitive API, I did not even realize this about either find_loader() or find_spec() until the bug the other day with reloading submodules. So I'm on board!
As to the best approach, I'll argue for keeping just 1 function. I tried out both ways and in the 2 function approach they simply didn't seem different enough. I've attached a patch (w/o tests, etc.) to show what I'm thinking.
Basically, let's make the "path" parameter keyword-only and rename it to "parent_path" (maybe even "precomputed_parent_path"). The patch takes cues from pkgutil.find_loader().
One function (the current one) promises not to import anything. The other (the new one) may have side effects, even if it fails to find the module (any successfully imported parent modules will remain imported).
IIRC, that "no side effects" guarantee was the original reason for the find_loader/get_loader split and it's a distinction I think is worth preserving. I just think the obvious name should be closer to importlib.import_module in semantics, with the "no side effects" version being the more obscure one.
I'm not saying get rid of the ability to guarantee no side-effects. All I'm saying is that the point of the function is to find a spec, whether that includes importing or implicitly using a parent package from sys.modules is just technical detail (and thus flags on the function). Put another way, the return value is no different and the basic arguments are the same, it's just internal details of how that return value is found that shifts. For me that's enough to control with keyword-only arguments instead of uniquely named functions.
And I'm willing to argue that import_module() not importing parent packages is annoying and should be controlled with a flag to make life easier since chances are you will want to import the parent packages anyway.
Look at it from a conceptual point of view, though, there are two quite different operations:
When you call it, you're going to want one behaviour or the other, since the two operations are not slight variants, they have major conceptual differences (with one being a substep of the other).
It looks like there are two concepts at play though:
In the case of #1, providing the path is just a means to an end. In contrast, for #2 providing the path is the whole point and side effects are something to which you may not have given any thought (but you probably aren't expecting them).
In both cases, it still boils down to (module name -> module spec). To me, handling both with a single function and a keyword-only "path" parameter seems sufficient, as long as people don't miss the fact that there are side effects in the default case.
The tricky part is that this is not a high-use API, so I'm trying not to think in terms of common case. (I'm tempted to say things like "in the common case you just want the spec and you don't care about that other stuff".)
Actually, I think you make a good point. importlib.find_spec should use the *import_module* signature (no path parameter, but with support for relative imports). If it is exposed at all, the "one level only" single step version should appear in importlib.machinery, not at the top level.
The current API exposes an implementation detail most users aren't going to care about, but a "let me know if it exists but don't import it yet" API that parallels the full dynamic import API is far more intuitive.
Nick: that sounds good to me. I like the idea of find_spec() being the same as import_module(), minus actually loading the module.
In that spirit, here's a rough patch that accomplishes that. It refactors things a bit to get there. Considering where we are in the release cycle, I'd rather punt on a proper rendition like this this until 3.5. In the meantime we could duplicate some code for the sake of find_spec() in the 3.4 timeframe.
Here's a version that I'd feel more comfortable with for 3.4 (with the intent of refactoring in 3.5).
Adding a dependency on bpo-19700, as I'm about to commit that fix and runpy should be updated to use whatever we come up with here.
I've closed bpo-16492 in favor of this ticket.
Any thoughts on the latest patch?
The "simple" patch actually looks like a good way to end up with find_spec specific bugs because it diverges from the more thoroughly tested main import path (e.g. it looks to me like it doesn't release the import lock properly)
So the _FoundSpec version actually looks better to me, because it keeps find_spec more inline with actual imports.
Good points, Nick. I was trying to limit touches on _bootstrap.py for 3.4, but that "simple" patch is mostly just a hacky band-aid. Here's a patch that hopefully stands on its own and still limits touches. (The patch is the bare-bones changes only.)
There are 2 things in particular with this patch that I'd like to be sure others are comfortable with:
Regarding the "nested" import, I didn't want to import the submodule at the top level due to possible future circular imports (there aren't any now with importlib.util, but still...). At the same time, to me using something out of a submodule in the parent module in this way is a bit of a bad code smell. I'd be more inclined to move the resolvename() implementation into _bootstrap.py to resolve it, but even better may be to move it to \_init__.py as a private name and then import it into importlib.util. As it stands, the patch simply uses the "nested" import.
As to using builtins.__import() when importing the parent module, that seems like it would result in less surprising results in the (uncommon) case that someone is using a custom __import() that would yield a different result than importlib.import_module(). In the case of find_spec() that makes more sense to me.
Actually, why *is* find_spec at package level, rather than in util with resolve_name? I know we said it was at package level in the PEP, but we never really gave the question serious thought.
Also, why use builtins.__import rather than using __import directly? (A comment as to why you're using __import over importmodule would also be good - I assume it's to get the automatic check for the \_path attribute)
find_spec() is at package level because find_module() is and for no other good reason I'm aware of. I'd be just fine with moving it to util. I don't expect it to be used enough to warrant that top-level placement.
Regarding builtins.__import(), I'm using it in the event that someone "installed" their own __import() which gives a different result than import_module(). Thus the parent used by find_spec() will be the expected one. I agree that comment about this would be good.
Brett: What do you think about moving importlib.find_spec() to importlib.util?
Moving to importlib.util is fine by me. I put find_loader() to mirror import_module(), but honestly the former is much less used compared to the latter, so moving it to importlib.util makes sense.
Here's an update to the patch. It includes doc and test changes. It also moves find_spec() into importlib.util.
(I'm removing the other patches since I don't consider them valid approaches any longer).
(...and to reduce any confusion on which patch is under consideration.)
If nothing else comes up, this will be the last PEP-451 functional change for 3.4.
New changeset 665f1ba77b57 by Eric Snow in branch 'default': bpo-19944: Fix importlib.find_spec() so it imports parents as needed. http://hg.python.org/cpython/rev/665f1ba77b57
pyclbr.py has this comment # XXX This will change once bpo-19944 lands. above the line that was changed by the patch applied. Am I correct in thinking that the comment is obsolete and should be removed?
Yeah, I'm pretty sure that TODO is out of date.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['type-feature', 'library']
title = 'Make importlib.find_spec load packages as needed'
updated_at =
user = 'https://github.com/ncoghlan'
```
bugs.python.org fields:
```python
activity =
actor = 'eric.snow'
assignee = 'none'
closed = True
closed_date =
closer = 'eric.snow'
components = ['Library (Lib)']
creation =
creator = 'ncoghlan'
dependencies = ['19700']
files = ['33336']
hgrepos = []
issue_num = 19944
keywords = ['patch']
message_count = 26.0
messages = ['205804', '205826', '205868', '205883', '205903', '205906', '205927', '205937', '205943', '206024', '206026', '206209', '206631', '206986', '206995', '207211', '207212', '207213', '207365', '207402', '207507', '207508', '207511', '209246', '262862', '262866']
nosy_count = 6.0
nosy_names = ['brett.cannon', 'terry.reedy', 'ncoghlan', 'Arfrever', 'python-dev', 'eric.snow']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue19944'
versions = ['Python 3.4']
```