julienp commented 6 months ago

Bug Report

For Pulumi we are looking into generating types using TypedDict to model cloud APIs. For example for Kubernetes we have something representing a Deployment.

class DeploymentArgsDict(TypedDict):
  api_version: NotRequired[Input[str]]
  kind: NotRequired[Input[str]]
  metadata: NotRequired[Input['ObjectMetaArgsDict']]
  ...

Pulumi has a notion of inputs and outputs, and the Input type used in the above example looks like this:

Input = Union[T, Awaitable[T], Output[T]]

class Output(Generic[T]):
    pass

Output does a lot things, but for the purposes of this repro all that matters is that its a generic type.

The K8S types can nest pretty deeply, and I suspect a combination of having nested literals along with the Union via the Input type is causing slowness here.

Example:

d: DeploymentArgsDict = {
    "metadata": {
        "name": "nginx",
    },
    "spec": {
        "selector":{
            "match_labels": {}
        },
        "replicas": 1,
        "template": {
            "metadata": {
                "labels": {}
            },
            "spec": {
                "containers": [{
                    "name": "nginx",
                    "image": "nginx"
                }]
            }
        }
    }
}

If I drop Awaitable[T] from the union to reduce it to two members, typechecking completes in 2 seconds. With it present, it takes 40 seconds.

This is a simplified example, and the actual code has another union layered on top. In that case we run out of memory.

To Reproduce

I have created a repro here https://github.com/julienp/typeddict-performance

Expected Behavior

It takes a second or two to typecheck.

Actual Behavior

It takes ~40 seconds on my machine

Your Environment

Mypy version used: 1.10
Mypy command-line flags: none
Mypy configuration options from mypy.ini (and other config files): none
Python version used: 3.12.2

julienp commented 6 months ago

Ran some more tests with a larger set of types, and it looks like the issue might be memory related. I am seeing python max on memory on my system, causing heavy swapping, while the process sits at 100% CPU, probably GCing constantly.

justinvp commented 6 months ago

Any idea what the root cause could be or how we could workaround it, or even help contribute a fix?

We'd like to improve Pulumi's Python SDKs by supporting TypedDict, but this performance issue means we'd have to workaround it for Mypy users, likely by conditionally typing these as untyped dictionaries for Mypy, which is rather unfortunate.

if not MYPY:
    class DeploymentArgsDict(TypedDict):
        api_version: NotRequired[Input[str]]
        kind: NotRequired[Input[str]]
        metadata: NotRequired[Input['ObjectMetaArgsDict']]
        ...
else:
    DeploymentArgsDict: TypeAlias = Mapping[str, Any]

east825 commented 2 months ago

Because of the number of Input unions used in type annotations there and a huge number of TypedDict involved, some resulting fully expanded TypedDicts are humongous. For instance, DeploymentArgsDict depends on declarations of all the following TypedDicts and the resulting complete type contains 27M+ (!) internal types. It has no self-references and recursively defined TypeDicts, though, as far as I can tell. PyCharm inference also suffers from this. I'm wondering how Pyright approaches such TypedDict trees.

DeploymentArgsDict
 ObjectMetaArgsDict
  ManagedFieldsEntryArgsDict
  OwnerReferenceArgsDict
 DeploymentSpecArgsDict
  LabelSelectorArgsDict
   LabelSelectorRequirementArgsDict
  PodTemplateSpecArgsDict
   ObjectMetaArgsDict
    ManagedFieldsEntryArgsDict
    OwnerReferenceArgsDict
   PodSpecArgsDict
    ContainerArgsDict
     EnvVarArgsDict
      EnvVarSourceArgsDict
       ConfigMapKeySelectorArgsDict
       ObjectFieldSelectorArgsDict
       ResourceFieldSelectorArgsDict
       SecretKeySelectorArgsDict
     EnvFromSourceArgsDict
      ConfigMapEnvSourceArgsDict
      SecretEnvSourceArgsDict
     LifecycleArgsDict
      LifecycleHandlerArgsDict
       ExecActionArgsDict
       HTTPGetActionArgsDict
        HTTPHeaderArgsDict
       SleepActionArgsDict
       TCPSocketActionArgsDict
      LifecycleHandlerArgsDict
       ExecActionArgsDict
       HTTPGetActionArgsDict
        HTTPHeaderArgsDict
       SleepActionArgsDict
       TCPSocketActionArgsDict
     ProbeArgsDict
      ExecActionArgsDict
      GRPCActionArgsDict
      HTTPGetActionArgsDict
       HTTPHeaderArgsDict
      TCPSocketActionArgsDict
     ContainerPortArgsDict
     ProbeArgsDict
      ExecActionArgsDict
      GRPCActionArgsDict
      HTTPGetActionArgsDict
       HTTPHeaderArgsDict
      TCPSocketActionArgsDict
     ContainerResizePolicyArgsDict
     ResourceRequirementsArgsDict
      ResourceClaimArgsDict
     SecurityContextArgsDict
      AppArmorProfileArgsDict
      CapabilitiesArgsDict
      SELinuxOptionsArgsDict
      SeccompProfileArgsDict
      WindowsSecurityContextOptionsArgsDict
     ProbeArgsDict
      ExecActionArgsDict
      GRPCActionArgsDict
      HTTPGetActionArgsDict
       HTTPHeaderArgsDict
      TCPSocketActionArgsDict
     VolumeDeviceArgsDict
     VolumeMountArgsDict
    AffinityArgsDict
     NodeAffinityArgsDict
      PreferredSchedulingTermArgsDict
       NodeSelectorTermArgsDict
        NodeSelectorRequirementArgsDict
        NodeSelectorRequirementArgsDict
      NodeSelectorArgsDict
       NodeSelectorTermArgsDict
        NodeSelectorRequirementArgsDict
        NodeSelectorRequirementArgsDict
     PodAffinityArgsDict
      WeightedPodAffinityTermArgsDict
       PodAffinityTermArgsDict
        LabelSelectorArgsDict
         LabelSelectorRequirementArgsDict
        LabelSelectorArgsDict
         LabelSelectorRequirementArgsDict
      PodAffinityTermArgsDict
       LabelSelectorArgsDict
        LabelSelectorRequirementArgsDict
       LabelSelectorArgsDict
        LabelSelectorRequirementArgsDict
     PodAntiAffinityArgsDict
      WeightedPodAffinityTermArgsDict
       PodAffinityTermArgsDict
        LabelSelectorArgsDict
         LabelSelectorRequirementArgsDict
        LabelSelectorArgsDict
         LabelSelectorRequirementArgsDict
      PodAffinityTermArgsDict
       LabelSelectorArgsDict
        LabelSelectorRequirementArgsDict
       LabelSelectorArgsDict
        LabelSelectorRequirementArgsDict
    PodDNSConfigArgsDict
     PodDNSConfigOptionArgsDict
    EphemeralContainerArgsDict
     EnvVarArgsDict
      EnvVarSourceArgsDict
       ConfigMapKeySelectorArgsDict
       ObjectFieldSelectorArgsDict
       ResourceFieldSelectorArgsDict
       SecretKeySelectorArgsDict
     EnvFromSourceArgsDict
      ConfigMapEnvSourceArgsDict
      SecretEnvSourceArgsDict
     LifecycleArgsDict
      LifecycleHandlerArgsDict
       ExecActionArgsDict
       HTTPGetActionArgsDict
        HTTPHeaderArgsDict
       SleepActionArgsDict
       TCPSocketActionArgsDict
      LifecycleHandlerArgsDict
       ExecActionArgsDict
       HTTPGetActionArgsDict
        HTTPHeaderArgsDict
       SleepActionArgsDict
       TCPSocketActionArgsDict
     ProbeArgsDict
      ExecActionArgsDict
      GRPCActionArgsDict
      HTTPGetActionArgsDict
       HTTPHeaderArgsDict
      TCPSocketActionArgsDict
     ContainerPortArgsDict
     ProbeArgsDict
      ExecActionArgsDict
      GRPCActionArgsDict
      HTTPGetActionArgsDict
       HTTPHeaderArgsDict
      TCPSocketActionArgsDict
     ContainerResizePolicyArgsDict
     ResourceRequirementsArgsDict
      ResourceClaimArgsDict
     SecurityContextArgsDict
      AppArmorProfileArgsDict
      CapabilitiesArgsDict
      SELinuxOptionsArgsDict
      SeccompProfileArgsDict
      WindowsSecurityContextOptionsArgsDict
     ProbeArgsDict
      ExecActionArgsDict
      GRPCActionArgsDict
      HTTPGetActionArgsDict
       HTTPHeaderArgsDict
      TCPSocketActionArgsDict
     VolumeDeviceArgsDict
     VolumeMountArgsDict
    HostAliasArgsDict
  DeploymentStrategyArgsDict
   RollingUpdateDeploymentArgsDict
 DeploymentStatusArgsDict
  DeploymentConditionArgsDict

erictraut commented 2 months ago

I'm wondering how Pyright approaches such TypedDict trees.

Does mypy internally expand all of these TypedDict definitions? If so, I'm curious why. Pyright internally builds one object for each class. There are only 438 of them in the code sample, which isn't that many. Each internal object refers to the other objects as needed. It doesn't do any expansion.

east825 commented 2 months ago

To be clear here, I didn't check how Mypy internally represents such types. In PyCharm, we represent TypedDicts as dict[str, UnionOfTypesOfAllFields] for some type checks, and constructing this union of all field types (recursively), simultaneously expanding type aliases, leads to such combinatoric explosion. But since there are memory problems, I guess the root cause might be somewhat similar.

JukkaL commented 2 months ago

There seem to be some easy improvements we can make to speed up the handling of nested TypedDicts. I don't think there's any deep reason why they'd have to be this slow. I'll look into this -- if it's easy enough, the next mypy release (to be out in a week or two) could include some optimizations.

JukkaL commented 2 months ago

17842 fixes some bottlenecks.

python / mypy

Slow typechecking on nested TypedDict with union members #17231

17842 fixes some bottlenecks.