python / typing

Python static typing home. Hosts the documentation and a user help forum.
https://typing.readthedocs.io/
Other
1.6k stars 237 forks source link

`import a.b` - should `a` be considered imported? #1358

Closed bluetech closed 8 months ago

bluetech commented 1 year ago

Runtime says yes, mypy says yes, pyright says no.

It would be nice if pyright and mypy were consistent on this point. The mypy approach makes sense because it matches runtime, and the pyright approach makes sense because it is explicit. So I don't have a strong preference personally. But if the mypy approach is preferred, maybe pyright can adopt it; and if the pyright approach is preferred, maybe mypy can add a strictness flag for it.

Should I go and suggest one or the other? And if so, which?

JelleZijlstra commented 1 year ago

Yes, a should be considered imported, because that's how the language works.

erictraut commented 1 year ago

We've made the decision in pyright not to model implicit side effects of the module loader. This is an example of one such implicit side effect. If we were to reconsider this decision, we'd need to decide what principle to use when deciding which implicit side effects to model. My preference is to remain consistent and say that code must be explicit about what it's importing, not rely on implicit side effects of the loader. If you want to use the symbols in a, then use an import a statement. If you want to use the symbols in a.b, then use import a.b. If you want to access symbols in both, then include both import statements. This practice is consistent with the vast majority of Python code that I've seen, and it's what I enforce in my Python code bases. In most cases where import a is omitted, I've found that it's unintended, and the developer is happy to have a tool point out the omission. As an example, the mypy source base includes import statements for all referenced modules and submodules in all but a handful of cases, and those appear to be omitted as an oversight.

There are many things that occur at runtime (including well-documented behavior) that a static type checker intentionally doesn't model and considers an error. The objective of a static type checker is not to match every aspect of the runtime but to give you at tool that helps you write robust, bug-free, maintainable code. Robust code should not rely on import side effects.

gvanrossum commented 1 year ago

I'm sorry, @erictraut, but that's not how Python's semantics are defined. This is not some accidental side effect of the loader. This is how import works. So IMO pyright ought to change. import a.b.c does two separate things:

hauntsaninja commented 1 year ago

In particular, I'm having a hard time seeing what the potential benefits of not modeling this part of Python's import system is.

(For example, type checkers do not exactly model Python's visibility semantics at runtime — we have the notion of "explicit reexports" — it's quite clear to me how that helps avoid bugs)

carljm commented 1 year ago

IMO there are some side effects of import that should not be relied on and that a type checker should not model. For instance, the fact that import a.b being executed in one module means that a different module that only includes import a can now do a.b.foo without error is a side effect at a distance that is dependent on accidents of import order and shouldn't be relied on. It's useful for a type checker to flag a.b.foo in the second module as an error.

But the fact that import a.b defines a in the same module is a local effect that is fully well defined as part of the semantics of import and fully reliable. I don't see any benefit to a type checker erroring on this.

AlexWaygood commented 1 year ago

If you want to use the symbols in a, then use an import a statement. If you want to use the symbols in a.b, then use import a.b. If you want to access symbols in both, then include both import statements. This practice is consistent with the vast majority of Python code that I've seen, and it's what I enforce in my Python code bases. In most cases where import a is omitted, I've found that it's unintended, and the developer is happy to have a tool point out the omission.

This makes sense to me as a defence of an optional lint to enforce style across a code base. It doesn't make sense to me as a defence of the default behavior of a type checker, which (if I understand correctly) can't be opted out from.

As others have said, this is quite well-defined behaviour in Python.

erictraut commented 1 year ago

It sounds like the consensus is that it is considered OK for Python code to rely on some implicit loader side effects. This seems to me like a bad coding practice, but there are other common idioms and practices in Python that strike me as equally undesirable, so I'll submit to the consensus of the community here.

Most Python code bases I've seen make all such imports explicit, so I didn't think this was a common enough practice to special case, but maybe I haven't been looking at a representative sample of Python code.

One of the reasons I've resisted adding this support is that it seems like a slippery slope. The statement "don't rely on implicit loader side effects" is clear and unambiguous. If we're going to allow some but disallow others, I'd like to come up with some principle that we can use to distinguish between the two categories. Otherwise it seems quite arbitrary.

Let's see if we can get agreement on what is acceptable and what is not.

  1. Guido stated that import a.b.c should imply import a, import a.b and import a.b.c. While I find this sort of implicit import behavior distasteful and would never rely on it in my own code, I agree that's a safe assumption for code within the same execution scope and any inner execution scopes.
    import http.cookies
    x = http.HTTPMethod # Guaranteed to work at runtime
    def func():
    x = http.HTTPMethod # Guaranteed to work at runtime
    func()
  2. What about inner scopes that perform an import? For example, a module that includes import a and function within that same module that includes the statement import a.b. Is it OK for any code anywhere within that module to expect to access a.b? The soundness of this assumption depends on execution order, so I think the answer is no. However, mypy allows this.
    import http
    def foo():
    import http.cookies
    x = http.cookies # Runtime error not caught by mypy but caught by pyright
    foo()
  3. What about statement sequences like import a; from a.b import Foo? Is it now OK to reference a.b.Bar? That seems like a really bad coding practice, but I think it is guaranteed to work as long as the two statements are together and there is no statement between them that attempts to access a.b.<x>. Proving correctness would be difficult in the general case for a static type checker. I'd prefer that this were considered unacceptable.
    import http
    x = http.cookies # Runtime error not caught by mypy but caught by pyright
    from http.cookies import CookieError
  4. What about aliasing? Consider import a as a_prime; import a.b. Is it considered acceptable to now access a_prime.b.Foo? Once again, this depends on whether the access is between the two statements (and whether both statements are unconditionally executed), so it would be difficult to prove correctness in the general case. I'd prefer if this were considered unacceptable.
    import http as h
    h.cookies.BaseCookie # Runtime error not caught by mypy but caught by pyright
    import http.cookies

The above list is not comprehensive, which is why I'd prefer to come up with a general principle that we can apply broadly. Here's an attempt to codify such a principle:

Implicit loader side effects are not modeled by a static type checker except in the case where those side effects are limited to a single atomic import statement, and those side effects are in effect only for the execution scope (module or function) where the import statement is located.

This principle would exclude all but 1 in the list above. It would also exclude the case that Carl mentioned, where one module relies on another module importing a submodule.

Thoughts?

gvanrossum commented 1 year ago

Hm. I feel you've got this backwards. If anything, the "loader side effect" in case 1 is that import a.b imports the submodule b and sets a.b to that. The assignment to a in the importing scope is the primary effect, not a side effect.

All the other cases you mention are indeed loader side effects and should continue to be flagged.

erictraut commented 1 year ago

Makes sense. OK, I'll implement 1 in pyright.

gramster commented 1 year ago

I am struggling to understand the difference between the "loader side effect" in case 1 and case 4 (and furthermore, this is not an "implicit side effect" but well-documented and by-design semantics, as covered in https://docs.python.org/3/reference/import.html#submodules - the "primary effect" as Guido called it above). In both cases importing submodule a.b will set the property "b" in parent module a to point to b. The fact that you get to a via an alias in case 4 seems orthogonal; once you get to a, however you get to it, it has a property b that refers to the submodule b (and that is solely as a consequence of "import a.b", not the preceding aliased import). Or am I missing something that makes case 4 different? @gvanrossum looking specifically to you here as you did say that this case should continue to be flagged.

(I'm also confused as to why this doesn't 'just work' if we support both import aliases and submodule imports; in both cases these seem like they should be setting keys in some hash tables (sys.modules at the first level and module attributes at the second level) and case 4 is just a chained lookup). Update: oh, I get it, because we didn't support implicit imports, we had a hacky solution of creating a module named 'a.b' instead of dereferencing the lookup chain from a to b. And even though case 4 doesn't have an implicit import, it was still subject to this. That said, if we add the implicit imports seems like it would be better to no longer use the 'a.b' name hack and just do it right).

gvanrossum commented 1 year ago

I don't feel strongly, but case 4 requires some alias analysis that case 1 doesn't.

We were just discussing this in a cpython PR where there was some code of the form

import multiprocessing as mp
import multiprocessing.connection
.
.
.
... mp.connection.wait(...)

It doesn't strike me as very clear, but there also doesn't appear to be a better solution -- we could change that line to

... multiprocessing.connection.wait(...)

but that would be inconsistent with the rest of the file, which consistently uses mp..

I'm okay with leaving this case up to the type checker, but not case 1.

erictraut commented 8 months ago

I think it's OK to close this issue now, since it was originally prompted by a request for a change in pyright, and that change has been made.