microsoft / pyright

Static Type Checker for Python
Other
13.31k stars 1.45k forks source link

Type hint incorrect on TypedDict.update? #5254

Closed alicederyn closed 1 year ago

alicederyn commented 1 year ago

Describe the bug The update type hint on TypedDicts seems to declare the incoming parameter as a Partial of the same type, which permits illegal operations on subclasses. It should be a @final Partial (ideally with ReadOnly keys to support the new ReadOnly special form)

(I am deducing current behaviour from error messages, as I can't find it documented, so apologies for any mistakes here.)

To Reproduce Run pyright on the code below

Expected behavior The line a1.update(a2) should emit an error.

Screenshots or Code

# pyright: strict

from typing import TypedDict

class A(TypedDict):
    foo: int

class B(A, TypedDict):
    bar: str

class C(A, TypedDict):
    bar: int

def update_a(a1: A, a2: A) -> None:
    a1.update(a2)  # This should be an error

b: B = {"foo": 1, "bar": "baz"}
c: C = {"foo": 2, "bar": 3}
update_a(b, c)  #  b is no longer a B.

VS Code extension or command-line pyright 1.1.313, on the command-line

Additional context I first raised this here — https://discuss.python.org/t/pep-705-typedmapping/24827/24 — but the issue was masked by pyright's ignoring the type declaration on a2; with the introduction of a function wrapping the update call, the error is exhibited.

I believe the correct type hint for the update parameter is a final, non-total, readonly version of the original typeddict, i.e. in this case

from typing import final, TypedDict
from typing_extensions import ReadOnly

@final
class A_Update(TypedDict, total=false):
    foo: ReadOnly[int]

The final here is how we currently indicate that no extra keys can be present on the dictionary. Any ReadOnly keys in the original type should be dropped, as they cannot be modified.

This will likely break a lot of existing code though. At the moment, if I try to use a final-decorated type hint on update_a, I get an error like "X" is incompatible with "Partial[A]" because of a @final mismatch, which means anyone using the (I believe) correct type hinting would currently be getting an error.

erictraut commented 1 year ago

I don't think that final makes sense here. The @final class decorator means "you cannot derive another class from this class". And as we've discussed previously, it has no meaning for a protocol class because protocols are structural type definitions. TypedDict is effectively a protocol class.

I agree that there's a potential type hole here, but I don't think @final is a solution. Nor do I see a good solution here. My take is that this is an acceptable hole in the type system (practicality over purity). TypeScript makes this same tradeoff in this case.

If there's a consensus in the typing community that this typing hole is unacceptable, then one potential solution is to introduce the notion of a TypedDict that defines all allowed keys. (I'm not sure what term to use for such a concept. Perhaps "complete"?) The TypeScript team has discussed adding such a concept to the TypeScript type system in the past, but they've always rejected it in the end as too complicated and of insufficient value.

alicederyn commented 1 year ago

I don't think that final makes sense here. The @final class decorator means "you cannot derive another class from this class". And as we've discussed previously, it has no meaning for a protocol class because protocols are structural type definitions. TypedDict is effectively a protocol class.

But according to a previous comment from you, mypy already treats final TypedDicts this way -- it narrows Unions of TypedDicts when using "key" in v, provided the types are final:

https://github.com/python/mypy/issues/13802#issuecomment-1266074178

I can't find the relevant issue/PR right now unfortunately.

erictraut commented 1 year ago

But according to a previous comment from you, mypy already treats final TypedDicts this way

I don't think mypy does this. Pyright does, and I now question whether that was the right decision. Here's the issue where this was discussed in some detail.

I think we have a few options here:

  1. Live with the typing holes associated with TypedDict and the fact that it's a structural type definition, so there may always be additional keys present in an object that conforms to a TypedDict.
  2. As a variant to option 1, close all of the typing holes. This would make all uses of TypedDict 100% type safe at the expense of making TypedDict much less usable in real-world code.
  3. Introduce a new concept "complete" (or some similar name) that implies that a given TypedDict may not contain any keys that are not explicitly declared. We'd need define all of the associated typing rules (e.g. subtyping, narrowing safety, etc.) based on this concept.
  4. Redefine the meaning of @final in the context of a TypedDict to imply "complete". This is the same as 3 except that it overloads the meaning of @final when it's used in a certain context.

Do you see any other options that I'm missing? Which option do you prefer?

alicederyn commented 1 year ago

I meant pyright, but yes, mypy now does this as well (https://github.com/python/mypy/pull/13838). Community consensus seems to be on the side of option 4.

I feel it makes an intuitive kind of sense. The unknown keys are added by subclassing, so adding @final effectively declares them as guaranteed to be missing.

Even if the update method hinting doesn't change, I think it should be made compatible with final typeddicts. I can raise that as a separate issue if that helps?

erictraut commented 1 year ago

We seem to be making up rules as we go along rather than formalizing some principles and then defining consistent rules based on these principles. Your draft PEP 705 might be a good opportunity for formalizing these principles. In the absence of such principles, I'm hesitant to make ad hoc changes that will cause churn for pyright users.

For example, if we formally establish the principle that a TypedDict decorated with @final is considered "complete", that would imply numerous changes to both mypy and pyright.

from typing import TypedDict, final

@final
class Movie(TypedDict):
    name: str

@final
class Blockbuster(TypedDict):
    name: str
    year: int

class Person(TypedDict):
    name: str

class Employee(Person):
    title: str

m: Movie = {"name": "Jaws"}
b: Blockbuster = {"name": "Star Wars", "year": 1976}
p: Person = {"name": "Steven"}
e: Employee = {"name": "Bob", "title": "Manager"}

# Both of the following assignments would need to generate
# errors because of a @final mismatch. Pyright currently
# emits an error for these lines, but mypy doesn't.
m1: Movie = p
p1: Person = m

# This should produce an error but doesn't currently in
# either pyright or mypy.
m2: Movie = b

# This should produce an error but doesn't currently in
# pyright but does in mypy.
m.update(b)

p2: Person = e

# This should not produce an error. It doesn't currently
# in pyright, but it does in mypy.
p.update(e)
alicederyn commented 1 year ago

Getting consensus seems valuable. I'm not sure whether a PEP is the right venue though? It doesn't affect core Python, only type checkers, and PEPs have historically been quite vague about exactly what the type constraints are meant to be for structural types. I assumed that was deliberate. Perhaps a discussion on the typing mailing list would be enough?

(I'm also not sure how this works with python versioning!)

alicederyn commented 1 year ago

I actually like @final a lot less when combined with the ReadOnly special type. It loses its intuitivity.

erictraut commented 1 year ago

I'm not sure whether a PEP is the right venue though?

I agree that by itself this specific topic is probably not PEP-worthy, but combined with the other issues that PEP 705 is attempting to pin down, it makes sense to do it all in one place. These topics are all related, after all.

I'll also note that when we discuss topics like this in the python/typing forum and seemingly come to consensus, I implement it in pyright and the Meta folks implement it in pyre, but it never gets implemented in mypy. I can point to many examples of this. I'm then forced to explain to pyright users for the next few years why pyright's behavior differs from mypy's. If the rules are formalized in a PEP, they tends to get implemented uniformly across all type checkers, and that's good for the entire community.

alicederyn commented 1 year ago

How would you feel about a new TypedDict parameter that forbids extra keys? I'm thinking TypedDict(other_keys=False), but open to other naming suggestions. Ones I've thought of:

erictraut commented 1 year ago

That's option 3 in my list above. It has the benefit of not abusing the @final decorator to mean something it wasn't originally intended to mean. It has the downsides of requiring a runtime change (and hence no backward compatibility without using typing_extensions). It also adds complexity that I'm not convinced is needed. (See my point above about the TypeScript team repeatedly rejecting this concept.) So, personally I'm lukewarm on the idea, but I think it's worth getting feedback on it from the broader typing community.

erictraut commented 1 year ago

@alicederyn, here's another historical typing-sig thread that you might find interesting.

alicederyn commented 1 year ago

It also adds complexity that I'm not convinced is needed.

I think this gets raised as a bug on the type checkers often enough that it warrants action. Also from a personal perspective, the lack of type narrowing on an if "key" in guard is a significant hurdle to writing code using TypedDict. I see support requests for this on our internal chat that don't end up as bug reports.

See my point above about the TypeScript team repeatedly rejecting this concept.

As I understand it, that's because TypeScript ignores the potential structural subtypes when code does if "key" in. This would be an option 5 that you didn't list, I believe. It doesn't seem like we as a community want to go down that route.

That's option 3 in my list above.

For sure :) I was interested in your opinion on the specific implementation.

I think:

It has the downsides of requiring a runtime change (and hence no backward compatibility without using typing_extensions).

In my draft PEP, I'm actually suggesting keeping option 4 for a while, as an alternative way of writing other_keys=False, except raising an error if the user uses ReadOnly keys. That should resolve backward compatibility?

erictraut commented 1 year ago

I think this gets raised as a bug on the type checkers often enough that it warrants action.

Can you elaborate on what gets raised as a bug? Are you specifically referring to the fact that a union that includes multiple TypedDict types isn't narrowed based on a if "key" in guard?

As I mentioned, TypeScript does not have the notion of an "object that cannot have any additional fields". This admittedly leaves holes in the TypeScript type system, but in my many years of writing reams of TypeScript code, I can't remember even once when I was bitten by one of these holes. Here's a contrived example that demonstrates this:

interface A {
    a: number;
}

interface B {
    b: string;
}

interface AB extends B {
    a: string;
}

function func(x: A | B) {
    if ('a' in x) {
        console.log('Got an A!');
        if (typeof x.a !== 'number') {
            console.log('Oops! Type of a should be a number!');
        }
    } else {
        console.log(`Got an B!: ${x.b}`);
    }
}

const ab: AB = { a: 'foo', b: 'bar' };
func(ab);

I find it somewhat amusing that the Python typing community gets very pedantic about edge cases in the type system when even the strictest typed Python code bases tend to be riddled with unsafe cast calls, # type: ignore error suppressions, and if TYPE_CHECKING "lies". I even find myself falling into this habit at times, being more pedantic than I probably should. In contrast, I think the TypeScript typing community (or rather, the TypeScript team at Microsoft) strikes a better balance in terms of practicality and simplicity.

If we were to apply the same principles from TypeScript to the TypedDict class in Python, we'd simply relax some of the existing rules about TypedDict and live with the potential false negatives. For example, we would allow discriminating type narrowing even if it opens up the potential for a false negative like the one in the TypeScript example above.

alicederyn commented 1 year ago

Can you elaborate on what gets raised as a bug? Are you specifically referring to the fact that a union that includes multiple TypedDict types isn't narrowed based on a if "key" in guard?

Yes. At least, this is the only one that would be resolved by final/closed TypedDict types.

For example, we would allow discriminating type narrowing even if it opens up the potential for a false negative like the one in the TypeScript example above.

I'd be happy to see this as an alternative to pursuing final/closed types. It would need to be agreed on though, which I'm not super optimistic about. Should I put it as a "rejected alternatives" in the PEP draft and then explicitly raise it in the typing discussion afterwards to see which people prefer? One could argue it's allowed by PEP-589:

In some cases potentially unsafe operations may be accepted if the alternative is to generate false positive errors for idiomatic code.

erictraut commented 1 year ago

Closing this for now pending further progress on PEP 705 and other discussions related to TypedDict. I'm happy to revisit it once we (the typing community) has come to more of a consensus on the topics discussed above.

alicederyn commented 1 year ago

The PEP revision has merged; discussion is open here: https://discuss.python.org/t/pep-705-typeddict-read-only-and-other-keys/36457