python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.32k stars 2.81k forks source link

final TypedDict #7981

Open crusaderky opened 4 years ago

crusaderky commented 4 years ago

Reopened from #7845 as requested

mypy seems to ignore the @final decorator when it is applied to a TypedDict. This issue becomes material when one invokes the items() or values() methods, or passes a generic string to get(); then the output will be incorrectly upcast to object since mypy thinks it may be dealing with a subclass.

from typing_extensions import final, TypedDict

@final
class Point(TypedDict):
    x: float
    y: float
    z: float

p: Point
x: str
reveal_type(p.values())
reveal_type(p.items())
reveal_type(p.get(x, 0.0))

Output:

12: note: Revealed type is 'typing.ValuesView[builtins.object*]'
13: note: Revealed type is 'typing.AbstractSet[Tuple[builtins.str*, builtins.object*]]'
14: note: Revealed type is 'builtins.object*'

Expected output:

12: note: Revealed type is 'typing.ValuesView[builtins.float*]'
13: note: Revealed type is 'typing.ItemsView[builtins.str*, builtins.float*]'
14: note: Revealed type is 'builtins.float*'
crusaderky commented 4 years ago

Related: #7849, #7865

JukkaL commented 4 years ago

The proposed semantics seem to be that a final TypedDict object doesn't have any extra keys beyond those included in the definition. Also, a final TypedDict can't be used as a base to define derived TypedDicts.

I haven't thought about this carefully, but it may be possible to define this in a sound fashion. A final TypedDict would only be compatible with another final TypedDict, and they must have the same keys and required keys, and the key types must be compatible.

Since this seems to work at runtime, this could be implemented without changes to typing. However, I'm not sure how useful this would be. I'd be interested in hearing if anybody has real-world use cases where this would be helpful.

ilevkivskyi commented 4 years ago

A final TypedDict would only be compatible with another final TypedDict, and they must have the same keys and required keys, and the key types must be compatible.

Exactly, it looks like this is the only way to make this sound. This however may be not very useful, as one would need to have same typed dict all the way down the call stack. On the other hand this may be OK for new codebases.

crusaderky commented 4 years ago

The same problem is on the keys. .keys() and .__iter__() yield str according to mypy, but the final decorator could make it change that to Literal.

Real life use case:

from typing_extensions import TypedDict, final

@final
class Counters(TypedDict):
    counter_1: int
    blah_blah: int
    something_else: int
    and_another: int
    one_more: int

def reset_counters(c: Counters) -> None:
    for k in c:
        c[k] = 0  # error: TypedDict key must be a string literal; expected one of ('counter_1', 'blah_blah', 'something_else', 'and_another', 'one_more')
intgr commented 4 years ago

I would find the implementation propsed by @JukkaL quite useful. I have a TypedDict whose all values are List[something] and I would like to iterate over the dict's items() in a generic manner.

JosiahKane commented 4 years ago

I would also find this useful in annotating real world code, as indicated in the linked closed issue.

My actual usecase is a Computer Vision type problem where we're using TypedDict to package together corresponding images. For example, there might be an RGB image and a binary mask indicating the foreground of that image.

class Image:
    def save(self, filename: str) -> None:
        ...

class ColourImage(Image):
    ...

class BitmaskImage(Image):
    ...

@final
class ImageAnnotation(TypedDict):
    img: ColourImage
    foreground: BitmaskImage
    # Although these are all Images, we don't want to use a general Dict[str, Image] because we'd lose the specialisations.

def save_all(img_pack: ImageAnnotation, prefix: str) -> None:
    for k, v in img_pack.items():
        # Currently this fails because save is not defined for object. 
        v.save(f"{prefix}_{k}.png")
        # It would instead be necessary to say 
        # for k, v in cast(Mapping[str, Image], img_pack).items():

A particularly elegant construction that might be enabled by marking a TypedDict closed for extension is the ability to create a new one of the same kind with a comprehension. For example

ImageT = TypeVar("ImageT", bound=Image)

def resize(img: ImageT, scale: float) -> ImageT:
    ...

def resize_all(img_pack: ImageAnnotation, scale: float) -> ImageAnnotation:
    return {k: resize(v, scale) for k, v in img_pack.items()}
    # This wouldn't work even with a cast
wmdrichards commented 4 years ago

I've also just run into this issue -- use case is I have a number of TypedDicts that are all Mapping[str, T], but for each TypedDict I'd like to restrict the keys to a known set. Besides declaring @final, one other option that comes to mind is to make TypedDict a generic class, so that the following syntax would work:

class Point(TypedDict[float]):
    x: float
    y: float
    z: float

This would I think solve the use cases in this thread, without disallowing extension

richardxia commented 3 years ago

However, I'm not sure how useful this would be. I'd be interested in hearing if anybody has real-world use cases where this would be helpful.

I just hit a case where it would be useful to have a final TypedDict. I have a utility function that accepts a list[dict[str, str]] and writes it out to a CSV file using csv.DictWriter. I'd like to be able to pass arbitrary TypedDicts to the utility function where I know the type of all the keys in the TypedDict is str. Currently, it fails because the TypedDicts don't conform to dict[str, str], presumably because non-final TypedDicts don't preclude other keys with different values. Since I am constructing the TypedDicts all within Python, I can guarantee that there are no other keys present, but there's no way for me to communicate that to the type system.

I suppose I could use a dataclass, but one thing that is useful about using TypedDicts this way is that I can use keys that are not valid Python identifiers (e.g. containing spaces), which I can directly serialize as a header row of the CSV, making it easier for non-technical users to read the CSVs.

intgr commented 3 years ago

I think most use cases mentioned here would be better served by intersection types: https://github.com/python/typing/issues/213

So if you want to express that all values of a dict are float and it also conforms to a TypedDict, you would write SomeTypedDict & dict[str, float] or Intersection[SomeTypedDict, dict[str, float]]. And you wouldn't need to artificially restrict subtyping of SomeTypedDict.

smurzin commented 2 years ago

This now also makes mypy diverge with pylance - https://github.com/microsoft/pyright/issues/1899 They've decided to allow final for TypedDict in order to be able to provide type narrowing in Union[TypedDict] case.

Which is problematic for projects that use pylance for IDE/intellisense and mypy in CI.

97littleleaf11 commented 2 years ago

Jukka‘s comment about this: https://github.com/python/mypy/issues/12266#issue-1155535982

godlygeek commented 2 years ago

pylance ... decided to allow final for TypedDict in order to be able to provide type narrowing in Union[TypedDict] case.

I find myself in exactly this case. Imagine I'm hitting an API that returns a JSON response. If my request succeeded, the service responds with something like

{"transactionDetails": {"transactionId": "1234-56-7890"}}

but if my request failed the service response with something like

{"error": {"code": 1234, "description": "Oops!"}}

I can annotate each of those possibilities as a TypedDict:

class TransactionDetails(TypedDict):
    transactionId: str

class TransactionDetailsResponse(TypedDict):
    transactionDetails: TransactionDetails

class Error(TypedDict):
    code: int
    description: str

class ErrorResponse(TypedDict):
    error: Error

Response = ErrorResponse | TransactionDetailsResponse

At this point, what I'd like mypy to understand is that something typed as Response either has a "transactionDetails" key (in which case its type should narrow to TransactionDetailsResponse) or it has a "error" key (in which case its type should narrow to ErrorResponse).

Unfortunately, I'm instead stuck with casts:

def handle_response(response: Response):
    if "transactionDetails" in response:
        print(f"Transaction {cast(TransactionDetailsResponse, response)['transactionDetails']['transactionId']} submitted!")
    else:
        print(f"Got error {cast(ErrorResponse, response)['error']['code']}: {cast(ErrorResponse, response)['error']['description']}")

I really don't want that cast. I want to say that "transactionDetails" in response is a TypeGuard[TransactionDetailsResponse], and that seems to be exactly what pyright implemented. It's not onerous to me to mark these TypedDict's as @final, since they are required to have distinct keys in order for me to distinguish which type of response I got - that's a part of the service's API contract, and it doesn't make sense for there to be subtypes for that reason.

Hnasar commented 5 months ago

the output will be incorrectly upcast to object since mypy thinks it may be dealing with a subclass.

I ran into this limitation and realized mypy's behavior (with assuming `object) is consistent with PEP 589 - TypedDict and the TypedDict Typing Spec.

A TypedDict with all int values is not consistent with Mapping[str, int], since there may be additional non-int values not visible through the type, due to structural subtyping. These can be accessed using the values() and items() methods in Mapping, for example.

Currently the above Typing Spec page, and the one for @final don't clarify the interaction between these two features.

erictraut commented 5 months ago

Currently the above Typing Spec page, and the one for @final don't clarify the interaction between these two features.

The @final class decorator indicates that a class cannot be subclassed. This makes sense for classes that define nominal types. However, TypedDict is a structural type, similar to a Protocol. That means two TypedDict classes with different names but the same field definitions are equivalent types. Their names and hierarchies don't matter for determining type consistency. For that reason, @final has no impact on a TypedDict type consistency rules, nor should it change the behavior of items or values.

What you're looking for is a new concept referred to as a "closed" TypedDict. This concept is introduced in draft PEP 728. It allows one to specify that a TypedDict cannot have any extra fields beyond the ones that are defined. (Alternatively, it allows for additional extra fields that are constrained to a particular type.)