Make importlib.find_spec load packages as needed

ncoghlan commented 10 years ago

BPO	19944
Nosy	@brettcannon, @terryjreedy, @ncoghlan, @ericsnowcurrently
Dependencies	bpo-19700: Update runpy for PEP 451
Files	issue19944-find-spec-mirror-import-module-direct.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-feature', 'library'] title = 'Make importlib.find_spec load packages as needed' updated_at = user = 'https://github.com/ncoghlan' ``` bugs.python.org fields: ```python activity = actor = 'eric.snow' assignee = 'none' closed = True closed_date = closer = 'eric.snow' components = ['Library (Lib)'] creation = creator = 'ncoghlan' dependencies = ['19700'] files = ['33336'] hgrepos = [] issue_num = 19944 keywords = ['patch'] message_count = 26.0 messages = ['205804', '205826', '205868', '205883', '205903', '205906', '205927', '205937', '205943', '206024', '206026', '206209', '206631', '206986', '206995', '207211', '207212', '207213', '207365', '207402', '207507', '207508', '207511', '209246', '262862', '262866'] nosy_count = 6.0 nosy_names = ['brett.cannon', 'terry.reedy', 'ncoghlan', 'Arfrever', 'python-dev', 'eric.snow'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue19944' versions = ['Python 3.4'] ```

ncoghlan commented 10 years ago

In implementing the runpy module updates for PEP-451 I ran into the fact that importlib.find_spec had copied the findloader behaviour where it doesn't handle dotted names automatically - you have to explicitly pass in the \_path__ from the parent module for it to work.

For find_loader, runpy used pkgutil.get_loader instead. The patch on bpo-19700 currently includes a runpy._fixed_find_spec function that handles walking the module name components, loading packages as necessary.

I believe it would be better to move the current thin wrapper around importlib._bootstrap._find_spec to importlib.find_spec_on_path (which will only search for the specified module name directly on the given path without breaking it up into components), with find_spec itself changed to *not* take a "path" argument, and instead always doing the full lookup (as implemented in the bpo-19700 patch _fixed_find_spec function)

brettcannon commented 10 years ago

Another option would be to add a keyword-only use_parent_path flag to importlib.find_spec() which is True by default. If use_parent is True, 'path' is provided, and there is no parent, then 'path' can act as a fallback.

Which makes me wonder if also adding a keyword-only import_parents keyword to implicitly import any parent packages as necessary would be helpful. It would imply use_parent_path as true. Might also be useful to add to importmodule() to get rid of that annoyance inherited from \_import__.

ncoghlan commented 10 years ago

They're fundamentally different operations though, with one being a single step of the other. I just think "do the full lookup" should have the more obvious name, with the "do a single level lookup" still as a public API, but with the less obvious name.

ericsnowcurrently commented 10 years ago

As a testament to counter-intuitive API, I did not even realize this about either find_loader() or find_spec() until the bug the other day with reloading submodules. So I'm on board!

As to the best approach, I'll argue for keeping just 1 function. I tried out both ways and in the 2 function approach they simply didn't seem different enough. I've attached a patch (w/o tests, etc.) to show what I'm thinking.

Basically, let's make the "path" parameter keyword-only and rename it to "parent_path" (maybe even "precomputed_parent_path"). The patch takes cues from pkgutil.find_loader().

ncoghlan commented 10 years ago

One function (the current one) promises not to import anything. The other (the new one) may have side effects, even if it fails to find the module (any successfully imported parent modules will remain imported).

IIRC, that "no side effects" guarantee was the original reason for the find_loader/get_loader split and it's a distinction I think is worth preserving. I just think the obvious name should be closer to importlib.import_module in semantics, with the "no side effects" version being the more obscure one.

brettcannon commented 10 years ago

I'm not saying get rid of the ability to guarantee no side-effects. All I'm saying is that the point of the function is to find a spec, whether that includes importing or implicitly using a parent package from sys.modules is just technical detail (and thus flags on the function). Put another way, the return value is no different and the basic arguments are the same, it's just internal details of how that return value is found that shifts. For me that's enough to control with keyword-only arguments instead of uniquely named functions.

And I'm willing to argue that import_module() not importing parent packages is annoying and should be controlled with a flag to make life easier since chances are you will want to import the parent packages anyway.

ncoghlan commented 10 years ago

Look at it from a conceptual point of view, though, there are two quite different operations:

find a spec at the current level, using the specified path (which may be empty, implying a top level import)
find a spec according to the full import cycle, including any parent module imports

When you call it, you're going to want one behaviour or the other, since the two operations are not slight variants, they have major conceptual differences (with one being a substep of the other).

ericsnowcurrently commented 10 years ago

It looks like there are two concepts at play though:

side-effect-free vs. may-have-side-effects
just-find-the-spec-dangit vs. find-the-spec-relative-to-submodule-search-locations-I-already-know

In the case of #1, providing the path is just a means to an end. In contrast, for #2 providing the path is the whole point and side effects are something to which you may not have given any thought (but you probably aren't expecting them).

In both cases, it still boils down to (module name -> module spec). To me, handling both with a single function and a keyword-only "path" parameter seems sufficient, as long as people don't miss the fact that there are side effects in the default case.

The tricky part is that this is not a high-use API, so I'm trying not to think in terms of common case. (I'm tempted to say things like "in the common case you just want the spec and you don't care about that other stuff".)

ncoghlan commented 10 years ago

Actually, I think you make a good point. importlib.find_spec should use the *import_module* signature (no path parameter, but with support for relative imports). If it is exposed at all, the "one level only" single step version should appear in importlib.machinery, not at the top level.

The current API exposes an implementation detail most users aren't going to care about, but a "let me know if it exists but don't import it yet" API that parallels the full dynamic import API is far more intuitive.

ericsnowcurrently commented 10 years ago

Nick: that sounds good to me. I like the idea of find_spec() being the same as import_module(), minus actually loading the module.

In that spirit, here's a rough patch that accomplishes that. It refactors things a bit to get there. Considering where we are in the release cycle, I'd rather punt on a proper rendition like this this until 3.5. In the meantime we could duplicate some code for the sake of find_spec() in the 3.4 timeframe.

ericsnowcurrently commented 10 years ago

Here's a version that I'd feel more comfortable with for 3.4 (with the intent of refactoring in 3.5).

ncoghlan commented 10 years ago

Adding a dependency on bpo-19700, as I'm about to commit that fix and runpy should be updated to use whatever we come up with here.

ericsnowcurrently commented 10 years ago

I've closed bpo-16492 in favor of this ticket.

ericsnowcurrently commented 10 years ago

Any thoughts on the latest patch?

ncoghlan commented 10 years ago

The "simple" patch actually looks like a good way to end up with find_spec specific bugs because it diverges from the more thoroughly tested main import path (e.g. it looks to me like it doesn't release the import lock properly)

So the _FoundSpec version actually looks better to me, because it keeps find_spec more inline with actual imports.

ericsnowcurrently commented 10 years ago

Good points, Nick. I was trying to limit touches on _bootstrap.py for 3.4, but that "simple" patch is mostly just a hacky band-aid. Here's a patch that hopefully stands on its own and still limits touches. (The patch is the bare-bones changes only.)

There are 2 things in particular with this patch that I'd like to be sure others are comfortable with:

an import statement (for resolve_name) inside find_spec()
the use of builtins.__import__() instead of importlib.import_module()

Regarding the "nested" import, I didn't want to import the submodule at the top level due to possible future circular imports (there aren't any now with importlib.util, but still...). At the same time, to me using something out of a submodule in the parent module in this way is a bit of a bad code smell. I'd be more inclined to move the resolvename() implementation into _bootstrap.py to resolve it, but even better may be to move it to \_init__.py as a private name and then import it into importlib.util. As it stands, the patch simply uses the "nested" import.

As to using builtins.__import() when importing the parent module, that seems like it would result in less surprising results in the (uncommon) case that someone is using a custom __import() that would yield a different result than importlib.import_module(). In the case of find_spec() that makes more sense to me.

ncoghlan commented 10 years ago

Actually, why *is* find_spec at package level, rather than in util with resolve_name? I know we said it was at package level in the PEP, but we never really gave the question serious thought.

Also, why use builtins.__import rather than using __import directly? (A comment as to why you're using __import over importmodule would also be good - I assume it's to get the automatic check for the \_path attribute)

ericsnowcurrently commented 10 years ago

find_spec() is at package level because find_module() is and for no other good reason I'm aware of. I'd be just fine with moving it to util. I don't expect it to be used enough to warrant that top-level placement.

Regarding builtins.__import(), I'm using it in the event that someone "installed" their own __import() which gives a different result than import_module(). Thus the parent used by find_spec() will be the expected one. I agree that comment about this would be good.

ericsnowcurrently commented 10 years ago

Brett: What do you think about moving importlib.find_spec() to importlib.util?

brettcannon commented 10 years ago

Moving to importlib.util is fine by me. I put find_loader() to mirror import_module(), but honestly the former is much less used compared to the latter, so moving it to importlib.util makes sense.

ericsnowcurrently commented 10 years ago

Here's an update to the patch. It includes doc and test changes. It also moves find_spec() into importlib.util.

(I'm removing the other patches since I don't consider them valid approaches any longer).

ericsnowcurrently commented 10 years ago

(...and to reduce any confusion on which patch is under consideration.)

ericsnowcurrently commented 10 years ago

If nothing else comes up, this will be the last PEP-451 functional change for 3.4.

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 10 years ago

New changeset 665f1ba77b57 by Eric Snow in branch 'default': bpo-19944: Fix importlib.find_spec() so it imports parents as needed. http://hg.python.org/cpython/rev/665f1ba77b57

terryjreedy commented 8 years ago

pyclbr.py has this comment # XXX This will change once bpo-19944 lands. above the line that was changed by the patch applied. Am I correct in thinking that the comment is obsolete and should be removed?

ericsnowcurrently commented 8 years ago

Yeah, I'm pretty sure that TODO is out of date.

python / cpython

Make importlib.find_spec load packages as needed #64143