hashicorp / terraform-cdk

Define infrastructure resources using programming constructs and provision them using HashiCorp Terraform
https://www.terraform.io/cdktf
Mozilla Public License 2.0
4.88k stars 455 forks source link

Providers of Large Size: Import is very slow in Python #3753

Open brent-at-aam opened 3 weeks ago

brent-at-aam commented 3 weeks ago

Expected Behavior

When using Python, the time taken for a synth is only correlated to the volume of resources within a stack.

Actual Behavior

When using Python, running any operations with larger providers takes a long time due to lengthy module import load times. You could have a single resource and it would still take ~30s to even get started with the synth.

Steps to Reproduce

  1. Use Python
  2. Add the cdktf-cdktf-provider-aws library to your project
  3. Make a small stack that uses any resource from the aws provider module
  4. Run a synth

Versions

Running with the latest versions

Providers

No response

Gist

No response

Possible Solutions

This is really just a general issue for all providers, but only becomes a big problem for the large ones like AWS. Looking through the issues, this is related to #2792, which was ostensibly fixed by #3030.

Perhaps this is a regression due to some newer behavior in upstream packages but importing AWS is back to taking around 30 seconds and has been for quite a while. I've switched a few projects from python to typescript to get away from it but really it would be nice to not have to do that.

The bulk of the time is spent loading the submodules (thanks @giner). This is why I wonder if it's a regression for the changes made in #3030

Another source of slowness is the large gzipped assembly in _jsii.

The root of the module loads it:

/init.py

from ._jsii import *

And worth noting that resource modules also load it:

/foo/init.py

from .._jsii import *

I know much of this is really just behavior of other libraries, and thus this might not be something you can control here. I also realize this is something you have probably already considered, but there is no discussion about it in issues I could find. Is it possible to instruct the package generation to build separate assemblies for each of the submodules of a provider package?

And you would then remove from ._jsii import * from the root. And resources would just import their specific jsii assembly.

Workarounds

None that are feasible

Anything Else?

No response

References

Help Wanted

Community Note

DanielMSchmidt commented 2 weeks ago

I think the changes we made in 0.18 might help here already: https://developer.hashicorp.com/terraform/cdktf/release/upgrade-guide-v0-18#python-performance-improvements-disable-root-level-provider-imports

I think any other improvements would need to be made on the JSII side, so I would suggest checking if there is a similar issue here: https://github.com/aws/jsii

brent-at-aam commented 2 weeks ago

@DanielMSchmidt Yes I referenced those changes above actually. It seems to have zero effect since the jsii assembly is imported at the root level, which is what slows down the runtime.

My question here really is if the package you are building could be structured differently so that it doesn't generate one large JSII assembly but many.

giner commented 2 weeks ago

Module libraries always all submodules in from the root __init__.py which causes all submodules being loaded all the time. This in turn causes very slow start time. Here is an example (we load cloudwatch_log_group and check whether alb_target_group gets loaded):

time python3 -c "import cdktf_cdktf_provider_aws.cloudwatch_log_group; print(type(cdktf_cdktf_provider_aws.alb_target_group))"
<class 'module'>

real    0m24.241s
user    0m24.020s
sys 0m2.288s

This looks as if it was a mistake. Fixing it will significantly improve experience with CDKTF for Python users.

giner commented 2 weeks ago

Here is a similar issue reported on jsii https://github.com/aws/jsii/issues/3389

brent-at-aam commented 1 week ago

Yeah I think we could side-step this entirely if the provider package for python was built similar to something like boto3-stubs, where each subject area is an extra package. I would love to have an interface like:

pip install cdktf-cdktf-provider-aws[s3,iam,lambda]

So this would require that instead of one giant AWS provider package, we would have multiple python packages.

I know these providers are all generated so special tweaks per language like this might be hard to do, but it would seem like it's possible to work within the limitations of jsii, while still delivering a better experience.

giner commented 1 week ago

The problem is not in this being a single package, it's __init__.py importing all submodules for (likely) no good reason is causing the issue

brent-at-aam commented 1 week ago

For sure the bulk of the time is the module imports. The jsii assembly load is a minimal impact comparatively, but still slows it down.

I updated the issue description to designate the appropriate source of the slow down.