python / cpython

The Python programming language
https://www.python.org
Other
61.97k stars 29.8k forks source link

Modify IMPORT_FROM to fallback on sys.modules #61836

Closed brettcannon closed 9 years ago

brettcannon commented 11 years ago
BPO 17636
Nosy @warsaw, @brettcannon, @pjeby, @ncoghlan, @pitrou, @kristjanvalur, @methane, @ericsnowcurrently, @serhiy-storchaka, @phmc
PRs
  • python/cpython#18347
  • Files
  • circular_import_tests.diff
  • import_from_tests.diff
  • import_from_tests2.diff
  • import_from_tests2.diff
  • circrelimports.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core', 'type-bug'] title = 'Modify IMPORT_FROM to fallback on sys.modules' updated_at = user = 'https://github.com/brettcannon' ``` bugs.python.org fields: ```python activity = actor = 'dino.viehland' assignee = 'none' closed = True closed_date = closer = 'pitrou' components = ['Interpreter Core'] creation = creator = 'brett.cannon' dependencies = [] files = ['30115', '30132', '36803', '36804', '36805'] hgrepos = [] issue_num = 17636 keywords = ['patch'] message_count = 34.0 messages = ['186059', '186073', '186078', '186079', '186080', '186093', '186095', '186096', '186098', '186102', '186103', '186887', '186922', '188319', '188417', '188441', '190667', '190670', '190671', '190673', '190679', '206434', '206476', '228497', '228501', '228502', '228822', '228833', '228915', '228927', '228928', '229257', '229258', '232601'] nosy_count = 13.0 nosy_names = ['barry', 'brett.cannon', 'pje', 'ncoghlan', 'pitrou', 'kristjan.jonsson', 'methane', 'python-dev', 'eric.snow', 'serhiy.storchaka', 'pconnell', 'isoschiz', 'Pascal.Chambon'] pr_nums = ['18347'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue17636' versions = ['Python 3.5'] ```

    brettcannon commented 11 years ago

    To help with circular imports, IMPORT_FROM should be modified to check sys.modules if a getattr() check fails.

    http://mail.python.org/pipermail/python-dev/2013-April/125123.html

    84401114-8e59-4056-83cb-632106c0b648 commented 11 years ago

    Like I mentioned on python-dev, it worries me a bit that this could be considered "unexpected", i.e. when there is a conflict between a legitimage attribute member of a module, and a submodule of the same name.

    Also, I wonder if this isn't a bigger change to the import mechanism, than simply: | Another would be | to always require an 'as' clause in this case, so that you would have | to write' | import .foo as foo which would possibly only require a change to the syntax.

    brettcannon commented 11 years ago

    It won't conflict with attributes. Only if the attribute does not exist on the module already will it fall back to sys.modules. If the import finished then any attribute created from the import will already be there and thus not be an issue.

    But to make sure this isn't a problem we can make sure there is a test in this instance.

    84401114-8e59-4056-83cb-632106c0b648 commented 11 years ago

    While I'm happy that this is being ackowledged as a problem, I'm not sure changing the "import x from y" semantics is necessarily a good idea. I mean, it is obvious to me that it means "import x, then getattr(x, "y")". I presume that is the meaning most people associate with it. Certainly, "y" can be any old module attribute.

    To change it to actually fall back to a submodule, well. I suppose if you could explain it roughly like "y = getattr(x, 'y', x.y)" then it would be ok.

    But I can think of contrived examples where this could break things:

    a.py:

    from .b import util

    #b.py
    from . import a
    from .util import util

    Because of the circular import order, b.util will not exist as an attribute on b when a does its import. So a.util will end up being util instead of util.util as one might expect.

    I'm basically saying that it is possible that the fallback to submodule will occur, where the successful getattr would occur later and return something different than the submodule. Possible. But perhaps very much an edge case :)

    brettcannon commented 11 years ago

    By declaring what the semantics will be to make the case even possible we are not breaking anything but making something possible. IOW it isn't even an edge case to me since there is no working case to compare against. =)

    afcdb1a8-bc5d-4852-a2a5-71b4ae6361ae commented 11 years ago

    On Fri, Apr 5, 2013 at 9:07 AM, Kristján Valur Jónsson \report@bugs.python.org\ wrote:

    But I can think of contrived examples where this could break things:

    a.py:

    from .b import util

    b.py

    from . import a from .util import util

    Because of the circular import order, b.util will not exist as an attribute on b when a does its import. So a.util will end up being util instead of util.util as one might expect.

    Not quite. It will only do this if the '.b.util' module is *in sys.modules at the time that a is being imported, which also means that .b.util has to be imported somewhere *before .a, in order to end up with a stale value. As written, your example actually works correctly if .a is imported first, and fails with an ImportError if .b is imported first.

    In fact, this example is kind of useful for proving the change *correct*, not broken. At the very least, it shows that you'll have to be more inventive to come up with a breaking case. ;-)

    Consider that for any module x.y, x must be in sys.modules before x.y can. Therefore, any "from x import" taking place before x is fully loaded will either happen before x.y is fully loaded, during the load, or after it, and the following cases apply:

    1. If it happens before, then it fails with an ImportError as is the case today.
    2. If it happens during (i.e. there is a cycle with x.y rather than with just x), then the import returns the module.
    3. If it happens after, then either the x.y attribute is bound to the submodule, or has been rebound to something else, or deleted.
    4. If after and deleted, the import returns the module.
    5. If after and rebound, the import returns the changed attribute (just like today)
    6. If after and normally bound, the import returns the module (just like today)

    The only cases in which the behavior changes from today are cases 2 and 4, which would both fail today with an ImportError. Case 4 also doesn't make much sense, since 'import x.y' would still permit access to the module -- so it'd have to be an odd situation in which you didn't want 'from import' (and *only* from import) to fail.

    So let's consider case 2, which would have to be written something like:

    #a.py
    from .b import util
    
    #b.py
    from .util import util
    
    #b/util.py
    from .. import a
    def util(): pass
    
    #__main__.py
    import b

    So, import b leads to starting the load of b.util, which leads to importing a, which succeeds and sees the b.util module instead of the b.util:util function.

    But, because of the circularity, this will *also happen if you import a first. So 'a' will *always see b.util rather than b.util:util, unless you remove the circularity. If you remove the circularity, then 'a' will always see b.util:util as the value of util.

    So case 2 does not lead to a hard-to-debug ordering dependency, it leads to an immediately discoverable change in behavior the moment you start importing a from b.util.

    The tl;dr version of the above is that you will only receive a module instead of an attribute if the module that's about to be replaced just imported *you* during its initial loading, and if it does that, it'll do it no matter what order the imports occur in, making the problem occur immediately and consistently as soon as you introduce the circularity.

    afcdb1a8-bc5d-4852-a2a5-71b4ae6361ae commented 11 years ago

    Actually, after a little reflection, I can see that there are more complex conditions to analyze, if 'b' doesn't import 'b.util', but some other module imports b and sets b.util. But that's just freaking insane and whoever does that probably deserves whatever they get, especially since a later 'import b.util' would blow away the modified attribute.

    So, this note is just to satisfy myself that the change doesn't introduce any *new* weirdness unless you are in a case where the parent imports and replaces the child, and the child is involved in an import cycle. If the parent doesn't import the child but just assigns an attribute, it's already broken the minute somebody else imports the child, and the same thing applies if something else sets the attribute without importing the child first, and if it imports the child first, then the previous analysis (mostly) applies, but either way that code is broken and will have plenty of other ordering dependencies to debug on its own. ;-)

    (No wonder Nick said nobody wanted to touch this... the analysis alone is a killer. ;-) Luckily, it seems we're good to go unless somebody can find a case I missed.)

    afcdb1a8-bc5d-4852-a2a5-71b4ae6361ae commented 11 years ago

    ...and I thought of *one* more way to trigger the changed behavior, which looks like:

    #b.py
    from .b import util
    import .a
    util = util.util
    
    #b/util.py
    def util(): pass

    (with the other files the same as before).

    I'm including it only for completeness' sake, because my original enumeration of cases ignored the possibility that 'a' could be doing the import *after b.util is loaded and bound, but *before it's rebound. However, it doesn't produce any new or problematic effects: it's essentially the same as if 'a' were imported from 'b.util'. Once again, regardless of the order in which imports happen, 'a' ends up with 'b.util' the moment the circularity is introduced, and it stays that way.

    It's also hard to argue that a case written this way isn't getting exactly what it *says it wants. In fact, it looks like code that was deliberately written to *force a to end up with the original b.util instead of the replaced one. ;-)

    Final (hopefully) conclusion: this change replaces the FAQ of "Don't use 'from-import' for circular imports" with the hopefully-less-FA'd Q of "when from-import is part of an import cycle, it works *exactly* like regular import, so you're going to get a submodule rather than an attribute. If you need the attribute instead, move the import so that it happens after the attribute is set up." (Which is exactly the same advice that would apply in a cycle with any other unitialized attribute, whether you were using from-import or not.)

    84401114-8e59-4056-83cb-632106c0b648 commented 11 years ago

    Wow, Good analysis Phillip. So, we agree that the fallback still is a sensible fallback? Then everything is fine and I'll put my official +1 stamp of approval on this.

    Now... should we consider the current behavious to be a bug? 2.7 sure could do with some fixing :)

    afcdb1a8-bc5d-4852-a2a5-71b4ae6361ae commented 11 years ago

    I don't care much one way or the other about it being considered a bug in 2.x, but it might be worth considering a bug in 3.x.

    Either way, though, the language reference for "from import" should reflect the change, and "alternative implementations" should be informed of the update.

    brettcannon commented 11 years ago

    It is not a bug but a change in semantics to accommodate a use-case so this will only be going into Python 3.4.

    ncoghlan commented 11 years ago

    Thanks for working through that Phillip - falling back to sys.modules when the expected attribute was missing is actually something I suggested as a possibility years ago, and Guido's response at the time was "If it was that easy, someone would have done it already".

    Your analysis is one of the pieces that was missing, along with Brett's insight that the code that needs the fallback is the IMPORT_FROM bytecode rather than the import implementation.

    I'm going to close the original circular import bug as being superseded by this one.

    afcdb1a8-bc5d-4852-a2a5-71b4ae6361ae commented 11 years ago

    On Sun, Apr 14, 2013 at 3:51 AM, Nick Coghlan \report@bugs.python.org\ wrote:

    Your analysis is one of the pieces that was missing,

    Unfortunately, I just noticed it's actually incorrect in a pretty important part In my original example, I said, "because of the circularity, this will *also* happen if you import 'a' first." This statement is actually false. Importing 'a' first in that example will result in a.util == b.util:util, not a.util=b.util. I made the mistake because I was for some reason thinking that 'a' was going to execute its import while being imported from b.util, and in that scenario it will not.

    That means there *is* an ordering dependency, and an ambiguity like this one can lie dormant until long after you've introduced the circularity. :-(

    brettcannon commented 11 years ago

    I have uploaded a patch with failing tests that should work after this is all said and done. Philip, please make sure I covered your tests as expected (I tweaked one because it already was working the way I did it). This way we at least know what we are aiming for in terms of results.

    afcdb1a8-bc5d-4852-a2a5-71b4ae6361ae commented 11 years ago

    It looks like maybe basic2 should be importing basic, not basic2. Otherwise, you might as well just import basic2 directly and be done with it. ;-) Likewise, I think indirect should be importing from indirect2, not from itself? i.e., I'd expect that test to fail even with the change. In addition, even if that is fixed, it's still not testing a cycle involving util; it's really just testing the same thing as "basic" is supposed to be testing.

    It also looks as though like the rebinding test doesn't actually test any rebinding, since nobody ever imports the thing that's rebound. It's failing for the same reason as the subpackage test. The subpackage test looks like a valid test, though - i.e., it's the "basic" case correctly implemented as a parent-child cycle. It's actually the only one of the tests that tests what it says it does.

    brettcannon commented 11 years ago

    Don't be distracted when trying to write tests is the lesson learned. Fixed basic and rebinding and just deleted indirect.

    84401114-8e59-4056-83cb-632106c0b648 commented 11 years ago

    One question. when doing "Programmatic" import, i.e. using the __import__ function, how do we do the same?

    #module foo.B
    A = __import__(".", fromlist=["A"]).A
    
    #module foo.A
    B = __import__(".", fromlist=["B"]).B

    Clearly the above won't work. Can we extend __import__ to allow a full path, including relative? The objection about which name to bind to is no longer valid, since the binding is explicit. So, could we do:

    #module foo.B
    A = __import__(".A")
    
    #module foo.A
    B = __import__(".B")

    ?

    84401114-8e59-4056-83cb-632106c0b648 commented 11 years ago

    sorry, the last example really needs to be:

    module foo.B

    A = __import__(".A", fromlist=["dummy"])

    to invoke the "return final module" behaviour. Hm, maybe this simply works... I didn't test.... Nope, I get ValueError: Empty module name (in 2.7)

    brettcannon commented 11 years ago

    You use importlib.importmodule to do programmatic imports (and FYI dots never go into the first argument of \_import__).

    84401114-8e59-4056-83cb-632106c0b648 commented 11 years ago

    Interesting. And I can report that it works, in 2.7, with code such as b = importlib.importmodule("..b", \_name) and a = importlib.importmodule("..a", \_name)

    Still, it seems odd that a whole "importlib" is requried ot resolve the relative import, and that it doesn"t work with __import, given how "supposedly" the import statement is supposed to translate to a __import call internally. One would think that __import__ had access to the relative path machinery somehow.

    brettcannon commented 11 years ago

    If you look at the importlib source code in 2.7 it's actually really small and simply translates the call into an __import call under the hood. So __import can completely handle relative imports, but you were not using the level argument to __import__ nor passing the necessary globals to make the relative name computation work (see the importlib source if you want to see how it works).

    ncoghlan commented 10 years ago

    With PEP-451 implemented, note that I have reopened bpo-992389 - the patch to set the parent module attribute at the same time as setting the sys.module attribute is actually pretty trivial for the PEP-451 loader case, and that now includes all the standard loaders.

    I also think that approach will have fewer weird edge cases than this version.

    afcdb1a8-bc5d-4852-a2a5-71b4ae6361ae commented 10 years ago

    Unfortunately, this is not quite true: the weird edge cases for that approach are simply *different*, and harder to diagnose. ;-)

    The approach here can only affect execution paths that would currently raise ImportError; that one can break execution paths that are *currently working*.

    As Brett said above, "By declaring what the semantics will be to make the case even possible we are not breaking anything but making something possible." That is not the case for the approach proposed in the other issue.

    pitrou commented 9 years ago

    This is Brett's tests patch updated against default branch.

    pitrou commented 9 years ago

    And this is a full patch.

    pitrou commented 9 years ago

    (full patch is actually the latest upload: circrelimports.patch. Sorry for the glitch)

    pitrou commented 9 years ago

    By the way, is there any documentation that would need to be updated for this?

    ncoghlan commented 9 years ago

    The language reference - at least the section on the import statement, and potentially the section on the overall operation of the import system.

    I don't *think* it would affect anywhere in the tutorial or the importlib docs, but they may be worth skimming to see if anything leaps out.

    pitrou commented 9 years ago

    By the way, my patch is currently using the package's __name attribute to build the fully-qualified name. Should it use the package's __package attribute instead?

    ncoghlan commented 9 years ago

    If I understand your question correctly, then yes, __package__ should be used when converting explicit relative imports to absolute ones.

    To write a test, run a submodule that needs the new fallback feature via the -m switch (it will fail if using __name__)

    pitrou commented 9 years ago

    Le 10/10/2014 00:11, Nick Coghlan a écrit :

    Nick Coghlan added the comment:

    If I understand your question correctly, then yes, __package__ should be used when converting explicit relative imports to absolute ones.

    To write a test, run a submodule that needs the new fallback feature via the -m switch (it will fail if using __name__)

    I'm not sure it would. __package__ is not looked up on the module (since precisely we failed importing it) but on the parent package (which is passed to IMPORT_FROM).

    pitrou commented 9 years ago

    Actually, looking up __package__ would be wrong.

    Say I have: from pack.module import foo

    and "foo" doesn't exist in pack.module but exists in pack. Since pack.module.__package == "pack", using __package would wrongly find the "foo" in pack.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 9 years ago

    New changeset fded07a2d616 by Antoine Pitrou in branch 'default': Issue bpo-17636: Circular imports involving relative imports are now supported. https://hg.python.org/cpython/rev/fded07a2d616

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 9 years ago

    New changeset 3a35638bce66 by Ned Deily in branch 'default': Issue bpo-17636: Install new test directories. https://hg.python.org/cpython/rev/3a35638bce66