Open julienp opened 6 months ago
Ran some more tests with a larger set of types, and it looks like the issue might be memory related. I am seeing python max on memory on my system, causing heavy swapping, while the process sits at 100% CPU, probably GCing constantly.
Any idea what the root cause could be or how we could workaround it, or even help contribute a fix?
We'd like to improve Pulumi's Python SDKs by supporting TypedDict, but this performance issue means we'd have to workaround it for Mypy users, likely by conditionally typing these as untyped dictionaries for Mypy, which is rather unfortunate.
if not MYPY:
class DeploymentArgsDict(TypedDict):
api_version: NotRequired[Input[str]]
kind: NotRequired[Input[str]]
metadata: NotRequired[Input['ObjectMetaArgsDict']]
...
else:
DeploymentArgsDict: TypeAlias = Mapping[str, Any]
Because of the number of Input
unions used in type annotations there and a huge number of TypedDict involved, some resulting fully expanded TypedDicts are humongous. For instance, DeploymentArgsDict
depends on declarations of all the following TypedDicts and the resulting complete type contains 27M+ (!) internal types. It has no self-references and recursively defined TypeDicts, though, as far as I can tell. PyCharm inference also suffers from this. I'm wondering how Pyright approaches such TypedDict trees.
DeploymentArgsDict
ObjectMetaArgsDict
ManagedFieldsEntryArgsDict
OwnerReferenceArgsDict
DeploymentSpecArgsDict
LabelSelectorArgsDict
LabelSelectorRequirementArgsDict
PodTemplateSpecArgsDict
ObjectMetaArgsDict
ManagedFieldsEntryArgsDict
OwnerReferenceArgsDict
PodSpecArgsDict
ContainerArgsDict
EnvVarArgsDict
EnvVarSourceArgsDict
ConfigMapKeySelectorArgsDict
ObjectFieldSelectorArgsDict
ResourceFieldSelectorArgsDict
SecretKeySelectorArgsDict
EnvFromSourceArgsDict
ConfigMapEnvSourceArgsDict
SecretEnvSourceArgsDict
LifecycleArgsDict
LifecycleHandlerArgsDict
ExecActionArgsDict
HTTPGetActionArgsDict
HTTPHeaderArgsDict
SleepActionArgsDict
TCPSocketActionArgsDict
LifecycleHandlerArgsDict
ExecActionArgsDict
HTTPGetActionArgsDict
HTTPHeaderArgsDict
SleepActionArgsDict
TCPSocketActionArgsDict
ProbeArgsDict
ExecActionArgsDict
GRPCActionArgsDict
HTTPGetActionArgsDict
HTTPHeaderArgsDict
TCPSocketActionArgsDict
ContainerPortArgsDict
ProbeArgsDict
ExecActionArgsDict
GRPCActionArgsDict
HTTPGetActionArgsDict
HTTPHeaderArgsDict
TCPSocketActionArgsDict
ContainerResizePolicyArgsDict
ResourceRequirementsArgsDict
ResourceClaimArgsDict
SecurityContextArgsDict
AppArmorProfileArgsDict
CapabilitiesArgsDict
SELinuxOptionsArgsDict
SeccompProfileArgsDict
WindowsSecurityContextOptionsArgsDict
ProbeArgsDict
ExecActionArgsDict
GRPCActionArgsDict
HTTPGetActionArgsDict
HTTPHeaderArgsDict
TCPSocketActionArgsDict
VolumeDeviceArgsDict
VolumeMountArgsDict
AffinityArgsDict
NodeAffinityArgsDict
PreferredSchedulingTermArgsDict
NodeSelectorTermArgsDict
NodeSelectorRequirementArgsDict
NodeSelectorRequirementArgsDict
NodeSelectorArgsDict
NodeSelectorTermArgsDict
NodeSelectorRequirementArgsDict
NodeSelectorRequirementArgsDict
PodAffinityArgsDict
WeightedPodAffinityTermArgsDict
PodAffinityTermArgsDict
LabelSelectorArgsDict
LabelSelectorRequirementArgsDict
LabelSelectorArgsDict
LabelSelectorRequirementArgsDict
PodAffinityTermArgsDict
LabelSelectorArgsDict
LabelSelectorRequirementArgsDict
LabelSelectorArgsDict
LabelSelectorRequirementArgsDict
PodAntiAffinityArgsDict
WeightedPodAffinityTermArgsDict
PodAffinityTermArgsDict
LabelSelectorArgsDict
LabelSelectorRequirementArgsDict
LabelSelectorArgsDict
LabelSelectorRequirementArgsDict
PodAffinityTermArgsDict
LabelSelectorArgsDict
LabelSelectorRequirementArgsDict
LabelSelectorArgsDict
LabelSelectorRequirementArgsDict
PodDNSConfigArgsDict
PodDNSConfigOptionArgsDict
EphemeralContainerArgsDict
EnvVarArgsDict
EnvVarSourceArgsDict
ConfigMapKeySelectorArgsDict
ObjectFieldSelectorArgsDict
ResourceFieldSelectorArgsDict
SecretKeySelectorArgsDict
EnvFromSourceArgsDict
ConfigMapEnvSourceArgsDict
SecretEnvSourceArgsDict
LifecycleArgsDict
LifecycleHandlerArgsDict
ExecActionArgsDict
HTTPGetActionArgsDict
HTTPHeaderArgsDict
SleepActionArgsDict
TCPSocketActionArgsDict
LifecycleHandlerArgsDict
ExecActionArgsDict
HTTPGetActionArgsDict
HTTPHeaderArgsDict
SleepActionArgsDict
TCPSocketActionArgsDict
ProbeArgsDict
ExecActionArgsDict
GRPCActionArgsDict
HTTPGetActionArgsDict
HTTPHeaderArgsDict
TCPSocketActionArgsDict
ContainerPortArgsDict
ProbeArgsDict
ExecActionArgsDict
GRPCActionArgsDict
HTTPGetActionArgsDict
HTTPHeaderArgsDict
TCPSocketActionArgsDict
ContainerResizePolicyArgsDict
ResourceRequirementsArgsDict
ResourceClaimArgsDict
SecurityContextArgsDict
AppArmorProfileArgsDict
CapabilitiesArgsDict
SELinuxOptionsArgsDict
SeccompProfileArgsDict
WindowsSecurityContextOptionsArgsDict
ProbeArgsDict
ExecActionArgsDict
GRPCActionArgsDict
HTTPGetActionArgsDict
HTTPHeaderArgsDict
TCPSocketActionArgsDict
VolumeDeviceArgsDict
VolumeMountArgsDict
HostAliasArgsDict
DeploymentStrategyArgsDict
RollingUpdateDeploymentArgsDict
DeploymentStatusArgsDict
DeploymentConditionArgsDict
I'm wondering how Pyright approaches such TypedDict trees.
Does mypy internally expand all of these TypedDict definitions? If so, I'm curious why. Pyright internally builds one object for each class. There are only 438 of them in the code sample, which isn't that many. Each internal object refers to the other objects as needed. It doesn't do any expansion.
To be clear here, I didn't check how Mypy internally represents such types. In PyCharm, we represent TypedDicts as dict[str, UnionOfTypesOfAllFields]
for some type checks, and constructing this union of all field types (recursively), simultaneously expanding type aliases, leads to such combinatoric explosion. But since there are memory problems, I guess the root cause might be somewhat similar.
There seem to be some easy improvements we can make to speed up the handling of nested TypedDicts. I don't think there's any deep reason why they'd have to be this slow. I'll look into this -- if it's easy enough, the next mypy release (to be out in a week or two) could include some optimizations.
Bug Report
For Pulumi we are looking into generating types using TypedDict to model cloud APIs. For example for Kubernetes we have something representing a Deployment.
Pulumi has a notion of inputs and outputs, and the
Input
type used in the above example looks like this:Output does a lot things, but for the purposes of this repro all that matters is that its a generic type.
The K8S types can nest pretty deeply, and I suspect a combination of having nested literals along with the
Union
via theInput
type is causing slowness here.Example:
If I drop
Awaitable[T]
from the union to reduce it to two members, typechecking completes in 2 seconds. With it present, it takes 40 seconds.This is a simplified example, and the actual code has another union layered on top. In that case we run out of memory.
To Reproduce
I have created a repro here https://github.com/julienp/typeddict-performance
Expected Behavior
It takes a second or two to typecheck.
Actual Behavior
It takes ~40 seconds on my machine
Your Environment
mypy.ini
(and other config files): none