python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.57k stars 2.85k forks source link

Covariance is not inferred for NamedTuple and frozen Dataclasses #17623

Open nielsbuwen opened 3 months ago

nielsbuwen commented 3 months ago

The new experimental feature NewGenericSyntax (#15238) should infer the variance of type variables. For attributes in NamedTuple and frozen dataclass the variance is inferred as "invariant" when it should be "covariant" (A clear and concise description of what the bug is.)

To Reproduce

I defined a generic NamedTuple and frozen dataclass with one member of type T. The classes should be covariant in T. So converting from Container[int] to Container[int |None] should be no problem.

playground link

Expected Behavior

mypy should accept all four variants of the type def

Actual Behavior

For Sequence and tuple[T, ...] it correctly accepts the function definition.

But for NamedTuple and dataclass it says

Incompatible return value type (got "XY[int]", expected "XY[int | None]") [return-value]

It looks like mypy does not think that the NamedTuple and dataclass are not covariant in T.

Your Environment

erictraut commented 3 months ago

It makes sense that variables in a NamedTuple and a frozen dataclass should be considered read-only and should therefore not cause a TypeVar to be inferred as invariant.

Interestingly, Python 3.13's addition of a __replace__ method effectively breaks this. Mypy was recently updated (as was pyright) to synthesize a __replace__ method in NamedTuples and dataclasses. This is done only for Python 3.13 and newer. Building on your playground code sample, the signature of of this synthesized method is:

@dataclass(frozen=True)
class FrozenDataclassBox[T]:
    value: T

    def __replace__(self, *, value: T) -> Self: ...

This __replace__ signature uses T in a contravariant position, which means that the auto-variance algorithm will determine that T is invariant. See this issue for details.

You can see this here in the pyright playground. If you hover over T, you'll see that pyright infers it to be covariant.

image

However, if the python version is then changed to Python 3.13, its variance changes to invariant, and a type error is reported in the upcast function.

image

@max-muoto, I'm interested in your thoughts here since you added the support for __replace__ in mypy. I used the same approach in pyright. This variance problem didn't occur to me previously.

Here are some options for us to consider:

  1. We live with the consequences that the synthesized __replace__ always makes any TypeVars used in a NamedTuple or a dataclass invariant starting in Python 3.13.
  2. We add a special-case exemption for __replace__ in the TypeVar variance algorithm. There are already exemptions for __init__ and __new__, so maybe this is acceptable.
  3. We modify the signature for __replace__. I can't think of any way of doing this without making __replace__ unsafe from a type checking perspective. It would probably need to accept parameters of type object and return an object of type Any.
JelleZijlstra commented 3 months ago

I thought about synthesizing a signature like this:

@dataclass(frozen=True)
class FrozenDataclassBox[T]:
    value: T

    def __replace__[U = T](self, *, value: U = ...) -> Self: ...

(Note the addition of a default for value. Every parameter to the __replace__ method should be optional.)

Now, if (given fcb: FrozenDataclassBox[bool]) you call fcb.__replace__(), U should get solved to the value of T, so you get a FrozenDataclassBox[bool], which is correct. And if you do fcb.__replace__(value=1), U gets solved to int, and you get a FrozenDataclassBox[int]. I believe this also extends to the general case where there may be multiple fields and multiple type parameters. If the original parameter T has bounds or constraints, then the synthesized type parameter U should have the same bounds or constraints.

However, this is invalid because TypeVar defaults may not use TypeVars from an outer scope (https://typing.readthedocs.io/en/latest/spec/generics.html#scoping-rules) and because the meaning of TypeVar defaults with function-scoped type parameters is unspecified (https://typing.readthedocs.io/en/latest/spec/generics.html#function-defaults). I tried to get around that by using a different type variable, but pyright still doesn't allow it (https://pyright-play.net/?pythonVersion=3.13&strict=true&code=GYJw9gtgBAJghgFzgYwDZwM4YKYagSwgAcwQFZEV0sBYAKFEigQE8j8A7AcwONPMw4yAfVZFs9egAF4SNIIAUjAF7YOAXgAqIAK7YAlPXlYoAMXCqOAEUrGMAITAAPANqaAugC56UX1ABucKh6nlCaknR%2BsNjAUMLCINhE6MjY8W4ATAA0YQDMUOphGe4KOKjAoeZgljZy1A7Ome45AFQ5gcHYoZr5hQB0A-pQALQAfGYWarVUgo6uPV5QA30RMDHMuAhKyABGlZPWtvVzLjtgYKju%2Bt6RfoLYImLY2zt98YnJKGnCCvo5VTUjrNGmcLlcfHcsA8EKI2M9gLs3gkkilvgoOnp1ABGP4TapTIFYE6cBDguhAA), and I don't think its behavior is incorrect. Still, perhaps a type checker could synthesize an internal signature following this approach.

max-muoto commented 3 months ago

It makes sense that variables in a NamedTuple and a frozen dataclass should be considered read-only and should therefore not cause a TypeVar to be inferred as invariant.

Interestingly, Python 3.13's addition of a __replace__ method effectively breaks this. Mypy was recently updated (as was pyright) to synthesize a __replace__ method in NamedTuples and dataclasses. This is done only for Python 3.13 and newer. Building on your playground code sample, the signature of of this synthesized method is:

@dataclass(frozen=True)
class FrozenDataclassBox[T]:
    value: T

    def __replace__(self, *, value: T) -> Self: ...

This __replace__ signature uses T in a contravariant position, which means that the auto-variance algorithm will determine that T is invariant. See this issue for details.

You can see this here in the pyright playground. If you hover over T, you'll see that pyright infers it to be covariant.

image

However, if the python version is then changed to Python 3.13, its variance changes to invariant, and a type error is reported in the upcast function.

image

@max-muoto, I'm interested in your thoughts here since you added the support for __replace__ in mypy. I used the same approach in pyright. This variance problem didn't occur to me previously.

Here are some options for us to consider:

  1. We live with the consequences that the synthesized __replace__ always makes any TypeVars used in a NamedTuple or a dataclass invariant starting in Python 3.13.
  2. We add a special-case exemption for __replace__ in the TypeVar variance algorithm. There are already exemptions for __init__ and __new__, so maybe this is acceptable.
  3. We modify the signature for __replace__. I can't think of any way of doing this without making __replace__ unsafe from a type checking perspective. It would probably need to accept parameters of type object and return an object of type Any.

This is definitely something I didn't think of when adding this to dataclasses, but it makes sense that we'd start to hit this. I think special-casing is probably the best path forward here, I obviously don't maintain any type-checkers, but the main concern here that comes to mind is the lack of documentation around this + all of the current special casing for __replace__. MyPy and Pyright will obviously have the correct behavior, and we can create issues for Pyre/PyType, but I would be worried about others implementing type-checkers. I made a cpython issue to document this alongside other dataclass behavior, but not sure that will happen: https://github.com/python/cpython/issues/121371

At least, in my experience, the more I can rely on auto-variance from 3.12 type parameters, the faster I can work with the typing system, so I think making it work correctly by default, even with special casing, is probably the best option for most users.

I think it would be good to define this implicit behavior around synthesizing __replace__ in the typing spec, and also updating the docs for the auto-variance algorithm. Happy to contribute here.

I do like @JelleZijlstra's idea though, I could play around with it in MyPy to see if it might work.

andersk commented 1 month ago

__replace__ breaks more than covariance. It completely breaks type safety in the presence of dataclass subclasses. This passes mypy (playground) and Pyright (playground) but crashes at runtime:

from dataclasses import dataclass

@dataclass(frozen=True)
class A:
    x: int | str
    def get_str(self) -> str:
        return ""

class B(A):
    x: str
    def get_str(self) -> str:
        return self.x

a: A = B(x="")
five: str = a.__replace__(x=5).get_str()  # 5
five.upper()  # AttributeError at runtime

(Of course, B(x=5) would’ve already broken type safety, but that seems like it should be straightforward to fix by adjusting the inferred type of dataclass subclass constructors. The __replace__ problem seems much more difficult, since we can’t tell statically that the subclass is even involved.)

aszs commented 1 month ago

Interestingly, Python 3.13's addition of a replace method effectively breaks this. Mypy was recently updated (as was pyright) to synthesize a replace method in NamedTuples and dataclasses. This is done only for Python 3.13 and newer.

I'm running into this error trying to support Python 3.13 -- how can you suppress these errors? Since it's autogenerated so you can't add a type ignore comment and I checked the PR adding this (#17469) and it seems to add it unconditionally. Defining your own __replace__ doesn't work either -- the Python dataclasses implementation respects a custom __replace__ but the mypy code doesn't.

aszs commented 3 weeks ago

Interestingly, Python 3.13's addition of a replace method effectively breaks this. Mypy was recently updated (as was pyright) to synthesize a replace method in NamedTuples and dataclasses. This is done only for Python 3.13 and newer.

I'm running into this error trying to support Python 3.13 -- how can you suppress these errors? Since it's autogenerated so you can't add a type ignore comment and I checked the PR adding this (#17469) and it seems to add it unconditionally. Defining your own __replace__ doesn't work either -- the Python dataclasses implementation respects a custom __replace__ but the mypy code doesn't.

I see that issue #18115 was opened addressing this... cross-referencing in case that's helpful...