microsoft / pylance-release

Documentation and issues for Pylance
Creative Commons Attribution 4.0 International
1.7k stars 767 forks source link

Base class type hints incorrectly override subclass attribute type #2303

Closed millerdev closed 2 years ago

millerdev commented 2 years ago

Environment data

Expected behaviour

Autocomplete suggestions should be derived from subclass attribute type.

image

Actual behaviour

Autocomplete suggestions are incorrectly derived from base class type annotation, despite the fact that the subclass has an attribute with a more specific type.

image

Code Snippet / Additional information

class Base:
    good = object()
    bad: object

class Sub(Base):
    good = []
    bad = []

Sub.good.  # suggestions include list methods
Sub.bad. # all suggestions begin with "__" (no list methods)
erictraut commented 2 years ago

The current behavior is "as designed".

If a class variable or instance variable has a declared type (e.g. bad: object), this type is enforced for the class hierarchy. Subclasses can override it with a narrow type if they'd like, but if they override it with a wider or incompatible type, the reportIncompatibleVariableOverride diagnostic check will emit an error.

class Base:
    a: object
    b: str

class Child(Base):
    a: str # This is allowed
    b: object # Error if reportIncompatibleVariableOverride is enabled

reveal_type(Base().a) # object
reveal_type(Base().b) # str
reveal_type(Child().a) # str
reveal_type(Child().b) # object

If a base class declares the type of a class variable or instance variable, subclasses can assign a default value to it as long as that assignment doesn't violate the type declaration.

class Base:
    a: object
    b: str

class Child(Base):
    a = "hi"
    b = 1 # Error: incompatible type

reveal_type(Base().a) # object
reveal_type(Base().b) # str
reveal_type(Child().a) # object
reveal_type(Child().b) # str

If a base class doesn't declare the type of a class variable or instance variable, subclasses can assign whatever value they want to the value.

class Base:
    a = object()
    b = "hi"

class Child(Base):
    a = "hi"
    b = 1

reveal_type(Base().a)  # object
reveal_type(Base().b)  # str
reveal_type(Child().a)  # str
reveal_type(Child().b)  # int

If your intent is for a subclass's variable to have a narrower type, you need to declare that in the subclass. Simply assigning a value to it does not declare its type.

millerdev commented 2 years ago

Hi @erictraut, thank you for the prompt response. Thanks also for the tremendous amount of work that has gone into building Pylance.

However, in this case it is imposing a type system on Python that does not align with the way the language works and is conventionally used.

A real-world example is Django's Model.objects, for which there is no type declared in Django. The annotation is in a stub file provided by Pylance:

ms-python.vscode-pylance-2022.1.5/dist/bundled/stubs/django/db/models/base.pyi

class Model(metaclass=ModelBase):
    ...
    objects: BaseManager[Any]
    ...

So Pylance is adding annotations to classes that do not otherwise declare them. Maybe this is a separate issue?

Model.objects is designed to be overridden with a manager that defines custom methods for the model class overriding it. Not providing suggestions for those custom methods severely reduces the utility of autocomplete in this context.

Not to mention BaseManager itself implements a tiny subset of methods implemented by the default Manager, which is derived from QuerySet. Pylance's behavior here is not very useful. When the default manager is overridden, the type annotation stubs are making things worse than if they were not there at all.

I also disagree with the comment in your second example:

class Base:
    a: object
    b: str

class Child(Base):
    a = "hi"
    b = 1  # This is perfectly legal. Python will not raise an error.

In every version of Python with which I am familiar the type of Child().b will be int, and inferring the type from the base class yields an incorrect result. Why one would define a class hierarchy that way is another question, but that doesn't change the way Python interprets the code. If the type checker can infer that a given context has a Child then it should also be able to infer that the type of Child().b is int.

erictraut commented 2 years ago

The main job of a type checker is to detect and report type violations. If someone declares that the type of a variable must be of a certain type, a type checker will tell them when someone assigns an incompatible type. This goes for local variables as well as variables used in class hierarchies.

You're correct that a type violation might not result in a runtime exception, but the whole point of a type checker is to eliminate the possibility that it could. In the example above, if the Base class declares that b must be a str, it probably has good reason to do so. It likely contains code that relies on the fact that it's a str. If a subclass assigns a non-str value to this variable — or redeclares the variable to be a type that is incompatible with str — and then calls one of the methods in Base, it will likely result in a runtime exception. Those are the types of bugs that static type checking is meant to prevent.

class Base:
    b: str

    def print_b(self):
        print("b is " + self.b)

class Child(Base):
    b = 1 # Reported as type violation by static type checker

    def foo(self):
        self.print_b()

Child().foo()  # Crashes at runtime

You certainly don't need to use static type checking in Python if you don't buy into its benefits, but this is the sort of bug that it is meant to prevent.

I don't know enough about django to say whether it is typed correctly. If you think that there is a bug in the django stubs bundled with pylance, please report the issue to the maintainer of those stubs.

millerdev commented 2 years ago

There is some nuance here that I don't want to get lost. I don't have a problem with the type checker reporting an error at the assignment of Child.b here:

class Base:
    b: str

class Child(Base):
    b = 1  # If the type checker wants to report an error here, that's fine.

However, it should not infer the type of Child.b to be str. That would be inconsistent with how Python works. This is also a red herring. The real issue is not about subclasses defining types that are incompatible with the base class definition.

The issue is that when the base class definition is overly broad (something like object from my original example), and a subclass defines a more specific type, the inferred type of the subclass attribute should not be derived from the base class when the type checker knows the subclass is being referenced. I've already provided code examples of this above. But for clarity:

class Base:
    x: object

    def print_x(self):
        print(f"x is {self.x}")  # the type checker only knows self.x is an object here

class Sub(Base):
    x = []  # no error here, a list is an object

    def foo(self):
        self.x.append(1)  # x is a list, this is safe
        self.print_x()  # and it does not conflict with base class implementation

Sub.x  # this is a list, suggesting it's only an object would be overly limiting

It is of course the responsibility of the subclass author to not override x in a way that would break the base class implementation. But that does not preclude the subclass from using the list features of x in contexts that are known to reference the subclass.

erictraut commented 2 years ago

The issue is that when the base class definition is overly broad (something like object from my original example), and a subclass defines a more specific type...

I agree with this statement. However, in your example above, the subclass has not declared a more specific type for x. It has simply assigned a value to the existing variable whose type has been declared by the base class. Don't confuse a value assignment with a type declaration; they're very different. If the child class wants to override the variable type with a narrower type, it can do so by declaring its intent.

class Base:
    x: object

class Sub(Base):
    x: list[int] = [] # Child declares type of 'x' to be a narrower type

reveal_type(Base.x) # object
reveal_type(Sub.x) # list[int]
millerdev commented 2 years ago

Don't confuse a value assignment with a type declaration; they're very different.

I am well aware of the difference. The type checker can and should do type inference rather than requiring a type annotation at every assignment. Having it be so pedantic that it gives an incorrect result without a type annotation will pressure developers toward two extremes:

I'm more concerned about the second scenario than the first, as it seems the current trends of typing in Python, not to mention TypeScript, are headed that direction. Python is not statically typed and we should not be encouraging developers to think about it that way. If you want that feel free to use a language like Java with a real static type system. I certainly don't.

If the child class wants to override the variable type with a narrower type, it can do so by declaring its intent.

Neither the library I depend on nor the code that I'm writing declares types. Pylance is injecting type declarations via stub files, and this is causing impaired code completion.

Given the type checker's preference for base class type declarations over subclass value assignments, the authors of the Django stubs can do nothing to resolve the issue either (other than remove type annotations from their stubs, which would defeat the purpose) since the desired completions come from my custom subclasses.

Are you suggesting that developers must either use type annotations in places where they are otherwise unnecessary or forgo correct code completion?

erictraut commented 2 years ago

Pyright will infer the type of a variable in the case where a type is not declared. However, if a base class annotates a variable, it doesn't override that declaration by inferring a different type in a child class.

I disagree with your assertion that this behavior will pressure developers to either forego adding type declarations or adding declarations everywhere. That contradicts all of my experience with typing large Python code bases.

You're describing a relatively unusual special case where a base class declares the type of an instance variable with a very wide type and subclasses are expected to declare a narrower type.

Are you suggesting that developers must either use type annotations in places where they are otherwise unnecessary or forgo correct code completion?

Yes, in this very specific situation, where a subclass intends for an instance variable to have a type that is narrower than the declared type in the base class, you would need to add a type annotation to make its intent clear.

millerdev commented 2 years ago

It is clear that it will be difficult to convince you that there is a problem with the type checker. However, I'm happy to have you acknowledge that you are advocating unnecessary type annotations as a crutch for the tool.

I will start by editing or deleting the Django type stubs, which is inconvenient, especially in the long term. If it continues to be an issue in other places I will likely uninstall Pylance.

FWIW, according to another developer in my organization, apparently PyCharm does not have this issue.

hmc-cs-mdrissi commented 2 years ago

I thin the rule you want would lead to enough errors I'd be fairly unhappy overriding class, picking an inconsistent type, and then it crashing later. If you override a class and pick an attribute type different then original I think there are 3 cases,

  1. The original type is too narrow. If the superclass claims it's a string, but is happy to allow it to be an int it should be broader (a union/superclass like object)
  2. The original type is correct across all usage of class, but potentially for a subset of usage other types are valid and you only use this subset. This is outside of scope of any type checker to know.
  3. The original type really should be generic type. For some classes designed without type checking in mind the way to really add types to do them is add a couple generic types. If they were designed today they probably would have picked a different way.

The first and third case are errors in type stubs of class and not really type checker. There are situations where ideal type stub is difficult to write, but that's a acceptable trade off given python's type system was retrofitted on to it many years after python was made. I'd guess a couple language features would be implemented differently if types were included in the original design.

In all cases there are explicit overrides. Overriding an attribute type is roughly same as overriding a function type signature. Both are basic places that can break type checking. Other languages with types would either forbid incompatible override or force you into using a generic. A safe type checker needs explicitness from user if they choose to do things more dynamically.

As for pycharm, pycharm uses weak unions for type checking. Which is allowing for runtime errors in exchange for usability and is inconsistent with the actual type system documentation. I'd guess they did a similar thing here. There is a trade off in safety and pycharm is choosing to ignore some type errors.

edit: Checking pycharm there are a couple bug reports on override rules, like these 1 and 2. So I don't know if pycharm lack of error here is intentional or a bug in it they haven't handled yet.

Also on your comment use a different language, that's not really feasible for much data science/ml ecosystem. Most of ml libraries/tooling is python specific and choosing a different ecosystem would cause me heavy library pains. I choose python mostly for libraries. If python was to become fully static typed I'd be very happy although I'm aware the chance of them is negligible. Most language choices I've seen are done with various trade offs. If you are using optional feature of python (types can always be ignored) then you should accept language being closer to static type errors.

millerdev commented 2 years ago

A few more examples comparing different kinds of namespaces and how they are handled differently by Pylance and a Python interpreter.

reveal_type(...) # Type of ... comments were copied from Pylance hover text (tooltips).

class Base:
    b: str
    reveal_type(b)  # Type of "b" is "Unbound"

class Sub(Base):
    b: int = 1
    # Note: assigned value of "b" is preferred over declared type
    reveal_type(b)  # Type of "b" is "Literal[1]"

class Child(Base):
    b = 1
    # Note: assigned value of "b" is ignored
    reveal_type(b)  # Type of "b" is "str"

reveal_type(Sub.b)  # Type of "Sub.b" is "int"
reveal_type(Child.b)  # Type of "Child.b" is "str"

c = Child.b
reveal_type(c)  # Type of "c" is "str"

d: str = ""
reveal_type(d)  # Type of "d" is "Literal['']"

def function_namespace():
    # Note: assigned values override declared types in outer namespace
    c = 1
    d = 1
    reveal_type(c)  # Type of "c" is "Literal[1]"
    reveal_type(d)  # Type of "d" is "Literal[1]"

Notice that Pylance treats the type reassignment in the function namespace differently from the same kind of assignment in a subclass namespace.

Run through an interpreter:

Python 3.9.7 (default, Oct 18 2021, 14:07:40) 
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def reveal_type(value):
...     print(type(value))
... 
>>> class Base:
...     b: str
...     #reveal_type(b)  # Type of "b" is "Unbound"
... 
>>> class Sub(Base):
...     b: int = 1
...     # Note: assigned value of "b" is preferred over declared type
...     reveal_type(b)  # Type of "b" is "Literal[1]"
... 
<class 'int'>

>>> class Child(Base):
...     b = 1
...     # Note: assigned value of "b" is ignored
...     reveal_type(b)  # Type of "b" is "str"
... 
<class 'int'>

>>> reveal_type(Sub.b)  # Type of "Sub.b" is "int"
<class 'int'>

>>> reveal_type(Child.b)  # Type of "Child.b" is "str"
<class 'int'>

>>> c = Child.b
>>> reveal_type(c)  # Type of "c" is "str"
<class 'int'>

>>> d: str = ""
>>> reveal_type(d)  # Type of "d" is "Literal['']"
<class 'str'>

>>> def function_namespace():
...     # Note: assigned values override declared types in outer namespace
...     c = 1
...     d = 1
...     reveal_type(c)  # Type of "c" is "Literal[1]"
...     reveal_type(d)  # Type of "d" is "Literal[1]"
... 
>>> function_namespace()
<class 'int'>
<class 'int'>

A workaround for the Django Model.objects issue.

Edit ms-python.vscode-pylance-2022.1.5/dist/bundled/stubs/django/db/models/base.pyi

class Model(metaclass=ModelBase):

    # replace type declaration with assignment
    #objects: BaseManager[Any]
    objects = BaseManager()

After this edit Model.objects makes better code completion suggestions. Given example code:

from django.db import models
from django.db.models.functions import Coalesce

class Book(models.Model):
    title = models.CharField(max_length=100)
    author = models.CharField(max_length=50)

class PollManager(models.Manager):
    def add_counts(self):
        return self.annotate(
            num_responses=Coalesce(models.Count("response"), 0)
        )

class OpinionPoll(models.Model):
    question = models.CharField(max_length=200)
    objects = PollManager()

Default manager has expected suggestions. image

Overridden manager includes custom method override suggestions. image

I wonder if this change would break other uses of django-stubs? If not, I retract my earlier statement that "the authors of the Django stubs can do nothing to resolve the issue."

erictraut commented 2 years ago

In your example above, the function function_namespace has local variables c and d that are unrelated to the variables c and d in the outer scope. They happen to share the same name, but they are completely different symbols in different symbol tables and with independent types. The c and d variables in the function do not have type declarations, so their types are inferred based on assignments.

That's very different from instance and class variables in a class hierarchy. When you use the expression self.b from a method in Child and use the same expression in a method in Base, both expressions refer to the same variable. The variable is shared by classes in the hierarchy, so it must have the same type. This is very different from the function example — and it explains why a static type analyzer needs to treat these cases very differently.

Looking more closely at the django stubs, I wonder if the Model class should properly be declared as a generic class. @hmc-cs-mdrissi alludes to this in his post above (see "case 3" in his list).

Here's how that would look:

_TManager = TypeVar("_TManager", bound=BaseManager[Any])

class Model(Generic[_TManager], metaclass=ModelBase):
    ...
    objects: _TManager
    ...

You could then define subclasses of Model like this:

class OpinionPoll(models.Model[PollManager]):
    question = models.CharField(max_length=200)
    objects = PollManager()

Perhaps they didn't do this because BaseManager is already a generic class parameterized by the Model type that it manages, so they felt that it was circular to make Model a generic class parameterized by the BaseManager that manages it?

millerdev commented 2 years ago

I did some research on the PEPs where type hints were introduced, but I could not find any clear specification of how they are to be handled in the context of class inheritance. The closest thing I could find is the somewhat tangential section of PEP-526: Where annotations aren't allowed, which says

type annotations belong in the scope owning the variable.

In context, this is clearly referring to local function scope vs global/module scope, although it could plausibly be extended to class (and subclass) scopes as well. In Python a subclass has a namespace distinct from the namespaces of its bases:

class Base:
    b: str = ""

class Child(Base):
    b = 1

Base.b == ""  # True
Child.b == 1  # True

The variables Base.b and Child.b can be referenced independently and may have different values. They are not the same variable.

Note that I have not brought class instances into this discussion at all, although even there, given the example above, Child().b resolves to the attribute Child.b, which overrides Base.b. There is no type hint for b in the scope of Child, but there is a value from which type information can be extracted.

This is what mypy does:

class Base:
    b: object
    reveal_type(b)  # note: Revealed type is "builtins.object"

class Child(Base):
    b = 1
    reveal_type(b)  # note: Revealed type is "builtins.int"

reveal_type(Child.b)  # note: Revealed type is "builtins.int"
reveal_type(Child().b)  # note: Revealed type is "builtins.int"

(produced with mypy 0.931)

There are several open issues for mypy that discuss inheritance of type hints (example), but as far as I can tell nothing like that has been implemented. https://github.com/python/typing/issues/269 also comes close, but ultimately was closed as out of scope.

erictraut commented 2 years ago

Yes, mypy is rather inconsistent in how it handles type declarations, value assignments, and assignment-based type narrowing. You can see these inconsistencies with local variables as well as instance and class variables. They lead to behaviors that users find confusing and inconvenient, as evidenced by bug reports that appear frequently in the mypy bug tracker.

Pyright's rules are simple and consistent: if a type is declared, the type checker will honor that declaration. If a type isn't declared, its type is inferred based on assigned values. For additional details, refer to this documentation.

In the case of class inheritance, pyright honors a type declaration found anywhere in the class hierarchy. If a child class wants to override the parent's declared type with a narrower type, it's able to do so by introducing its own type declaration. These rules are straightforward, consistent and based on reasoned principles. We're unlikely to change these rules.

As I mentioned above, it looks like the Model class should probably be defined as generic. I'm not sure why the authors of the Django stubs decided not to model it that way. Perhaps you have some insights given that you have more familiarity with this library than I do.

millerdev commented 2 years ago

The referenced document says nothing about inheritance of type hints, which must mean there is more to pyright than the rules outlined there.

In response to another pyright issue you said

Pyright is a standards-based type checker. We generally don't add any type-related functionality to it unless it is either in a standard or in the process of becoming a standard. So, to answer your hypothetical question, the answer is "yes, it would require a PEP".

Can you point me to the PEP that covers type hint inheritance?

erictraut commented 2 years ago

Pyright follows the same consistent rules for class hierarchies as it does for local variables, etc. If a type is declared, the type checker honors that declaration. If a type isn't declared, the type is inferred based on assigned values. The only difference is that with class hierarchies, a child class can optionally choose to redeclare the type of a variable or method as long as the redeclared type is narrower than the original.

What are your thoughts on my suggestion above that Model should be modeled as a generic class? Do you know why it wasn't done that way in the Django stubs.

millerdev commented 2 years ago

Pyright follows the same consistent rules for class hierarchies as it does for local variables

Except it doesn't because you have already asserted that local variables are unrelated to those defined in outer namespaces. That is, unless you have changed your mind on the example I previously presented where a name declared in a non-local namespace, which would otherwise be accessible in the local namespace, is "overridden" by a local variable with the same name:


d: str = ""
reveal_type(d)  # Type of "d" is "Literal['']"

def function_namespace():
    # Note: assigned value overrides declared type in outer namespace
    d = 1
    reveal_type(d)  # Type of "d" is "Literal[1]"

What are your thoughts on my suggestion above that Model should be modeled as a generic class? Do you know why it wasn't done that way in the Django stubs.

It might work better for Pyright, but it is not the way Model classes are conventionally defined. Thousands of lines of existing code and documentation would need to be updated to use your proposed generic typing syntax.

from typing import Any, Generic, TypeVar

class BaseManager:
    bad: object
    good = object()

_TManager = TypeVar("_TManager", bound=BaseManager[Any])

class Model(Generic[_TManager]):
    objects: _TManager

class PollManager(BaseManager):
    bad = []
    good = []

class OpinionPoll(Model[PollManager]):
    objects = PollManager()

reveal_type(OpinionPoll.objects)  # Type of "OpinionPoll.objects" is "PollManager"
reveal_type(OpinionPoll.objects.bad)  # Type of "OpinionPoll.objects.bad" is "object"
reveal_type(OpinionPoll.objects.good)  # Type of "OpinionPoll.objects.good" is "list"

On the other hand, the proposed generic typing syntax makes things worse for models defined in the conventional way.

class StandardPoll(Model):
    objects = PollManager()

reveal_type(StandardPoll.objects)  # Type of "StandardPoll.objects" is "Any"
reveal_type(StandardPoll.objects.bad)  # Type of "StandardPoll.objects.bad" is "Any"
reveal_type(StandardPoll.objects.good)  # Type of "StandardPoll.objects.good" is "Any"

I am still interested in an answer to the question in my previous post. Can you point me to the PEP that covers type hint inheritance? I read all of the PEPs I could find that seemed related to that topic, but was unable to find one that addressed it.

erictraut commented 2 years ago

The typing PEPs don't provide much in the way of guidance when it comes to type inference rules. They intentionally leave that up to each type checker.

kaapstorm commented 2 years ago

This has been a very interesting discussion on the virtues of Pyright.

But I think there is an important aspect of the original issue that this discussion has missed.

If we take for granted that a developer has written the following ...

class Base:
    bad: object

class Sub(Base):
    bad = []

Sub.bad.

... then how should Pylance help them?

We can debate whether or not what the developer has written is good. Maybe there's a conversation to be had around how to help them to use type hints more effectively. But we should open a separate Github issue for that conversation. This issue was originally about autocomplete suggestions.

What would be the most helpful autocomplete suggestions to show the developer?

Considering what they have written, it seems to me to be most likely that the developer is wanting properties and methods of list.

I wouldn't want to frame this as a bug report. Clearly we've got a lot of justification for the current behavior. This is not a problem with typing. I would prefer to express this as a suggestion for how to help developers better. It's an opportunity to improve usability.

Kjir commented 2 years ago

I don't know if this is the same issue, but I have a concrete case where redefining the return type of a method in a subclass is desired. For example:

class RestClient:
   def get(self, url):
       return self.request("GET", url)

   def request(self, method, url):
       response = sync_http_request.execute(method, url)
       return response.json()

class AsyncRestClient(RestClient):
    async request(self, method, url):
      response = await async_http_request.execute(method, url)
      return await response.json()

The return type for AsyncRestClient().get() is still the one from the parent class, despite it actually being wrapped in a Coroutine

judej commented 2 years ago

As @erictraut has explained, this is a behavior that is by design. Closing since there is an easy work around (provide a type declaration in you sub class)