python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.22k stars 2.78k forks source link

Slow typechecking on nested TypedDict with union members #17231

Open julienp opened 4 months ago

julienp commented 4 months ago

Bug Report

For Pulumi we are looking into generating types using TypedDict to model cloud APIs. For example for Kubernetes we have something representing a Deployment.

class DeploymentArgsDict(TypedDict):
  api_version: NotRequired[Input[str]]
  kind: NotRequired[Input[str]]
  metadata: NotRequired[Input['ObjectMetaArgsDict']]
  ...

Pulumi has a notion of inputs and outputs, and the Input type used in the above example looks like this:

Input = Union[T, Awaitable[T], Output[T]]

class Output(Generic[T]):
    pass

Output does a lot things, but for the purposes of this repro all that matters is that its a generic type.

The K8S types can nest pretty deeply, and I suspect a combination of having nested literals along with the Union via the Input type is causing slowness here.

Example:

d: DeploymentArgsDict = {
    "metadata": {
        "name": "nginx",
    },
    "spec": {
        "selector":{
            "match_labels": {}
        },
        "replicas": 1,
        "template": {
            "metadata": {
                "labels": {}
            },
            "spec": {
                "containers": [{
                    "name": "nginx",
                    "image": "nginx"
                }]
            }
        }
    }
}

If I drop Awaitable[T] from the union to reduce it to two members, typechecking completes in 2 seconds. With it present, it takes 40 seconds.

This is a simplified example, and the actual code has another union layered on top. In that case we run out of memory.

To Reproduce

I have created a repro here https://github.com/julienp/typeddict-performance

Expected Behavior

It takes a second or two to typecheck.

Actual Behavior

It takes ~40 seconds on my machine

Your Environment

julienp commented 4 months ago

Ran some more tests with a larger set of types, and it looks like the issue might be memory related. I am seeing python max on memory on my system, causing heavy swapping, while the process sits at 100% CPU, probably GCing constantly.

justinvp commented 3 months ago

Any idea what the root cause could be or how we could workaround it, or even help contribute a fix?

We'd like to improve Pulumi's Python SDKs by supporting TypedDict, but this performance issue means we'd have to workaround it for Mypy users, likely by conditionally typing these as untyped dictionaries for Mypy, which is rather unfortunate.

if not MYPY:
    class DeploymentArgsDict(TypedDict):
        api_version: NotRequired[Input[str]]
        kind: NotRequired[Input[str]]
        metadata: NotRequired[Input['ObjectMetaArgsDict']]
        ...
else:
    DeploymentArgsDict: TypeAlias = Mapping[str, Any]