Open t0yv0 opened 3 months ago
Some information. Most of the size is present in the upstream provider build:
du -sh terraform-provider-aws ~/code/terraform-provider-aws
752M terraform-provider-aws
From https://github.com/t0yv0/gobuildsize report on the upstream provider, major contributing packages are:
github.com/aws/aws-sdk-go-v2/service/ec2 137603662
github.com/hashicorp/terraform-provider-aws/internal/service/lexv2models 90502114
github.com/hashicorp/terraform-provider-aws/internal/service/batch 65336638
github.com/hashicorp/terraform-provider-aws/internal/service/ec2 60966940
github.com/hashicorp/terraform-provider-aws/internal/service/bedrockagent 59476792
github.com/aws/aws-sdk-go/service/sagemaker 56084368
github.com/aws/aws-sdk-go/service/quicksight 49658874
github.com/aws/aws-sdk-go-v2/service/iot 49391552
github.com/aws/aws-sdk-go-v2/service/glue 48909028
github.com/hashicorp/terraform-provider-aws/internal/service/securitylake 44860032
github.com/hashicorp/terraform-provider-aws/internal/service/cognitoidp 44200606
github.com/hashicorp/terraform-provider-aws/internal/service/securityhub 43308668
github.com/aws/aws-sdk-go-v2/service/rds 42986552
github.com/hashicorp/terraform-provider-aws/internal/service/verifiedpermissions 42005986
github.com/hashicorp/terraform-provider-aws/internal/service/appfabric 40250200
github.com/hashicorp/terraform-provider-aws/internal/service/rekognition 39744438
github.com/hashicorp/terraform-provider-aws/internal/service/ssmcontacts 39644536
github.com/hashicorp/terraform-provider-aws/internal/service/bedrock 38030686
github.com/hashicorp/terraform-provider-aws/internal/service/networkfirewall 37901482
github.com/hashicorp/terraform-provider-aws/internal/service/lakeformation 37305190
github.com/hashicorp/terraform-provider-aws/internal/service/medialive 37158322
github.com/hashicorp/terraform-provider-aws/internal/service/devopsguru 36337252
github.com/aws/aws-sdk-go-v2/service/chime 36253706
github.com/hashicorp/terraform-provider-aws/internal/service/cloudfront 35393416
github.com/hashicorp/terraform-provider-aws/internal/service/rds 34928184
github.com/aws/aws-sdk-go/service/connect 34686812
github.com/hashicorp/terraform-provider-aws/internal/service/elasticache 34474934
github.com/hashicorp/terraform-provider-aws/internal/service/s3 33639018
github.com/hashicorp/terraform-provider-aws/internal/service/bcmdataexports 33586502
github.com/aws/aws-sdk-go-v2/service/ssm 33467788
github.com/hashicorp/terraform-provider-aws/internal/service/timestreamwrite 33138974
github.com/aws/aws-sdk-go-v2/service/redshift 32826864
github.com/hashicorp/terraform-provider-aws/internal/service/appstream 32248636
github.com/hashicorp/terraform-provider-aws/internal/service/guardduty 32197636
github.com/hashicorp/terraform-provider-aws/internal/service/m2 31757684
github.com/aws/aws-sdk-go-v2/service/securityhub 31605124
github.com/hashicorp/terraform-provider-aws/internal/service/resourceexplorer2 31479248
github.com/hashicorp/terraform-provider-aws/internal/service/osis 31141590
github.com/hashicorp/terraform-provider-aws/internal/service/redshift 31092528
github.com/hashicorp/terraform-provider-aws/internal/service/s3control 30801928
github.com/hashicorp/terraform-provider-aws/internal/service/amp 30617456
github.com/aws/aws-sdk-go-v2/service/iam 30602038
@t0yv0 additional feedback from the customer about the impact of the plugin binary file size growth:
It is mainly two things:
- download times
- space on disk, hat to delete 50GB of plugins from disk yesterday
This makes sense Ringo, thanks for that detail. With 50GB of plugins, I am wondering if something can be done at the plugin cache level, some form of scheduled eviction, as it appears multiple copies of provider(s) are involved there.
This makes sense Ringo, thanks for that detail. With 50GB of plugins, I am wondering if something can be done at the plugin cache level, some form of scheduled eviction, as it appears multiple copies of provider(s) are involved there.
A bit of context there. This grew over the course of roughly half a year while maintaining 50+ pulumi stacks each with its own repo with its own dependencies which are regularly updated in a semi automated way.
I think there's a feature request in pulumi/pulumi that could be helpful in a situation like this: https://github.com/pulumi/pulumi/issues/7505
I will cross-link and add some ideas there.
We would love to reduce the binary size of pulumi-aws but since it appears to be dominated by terraform-provider-aws
binary size it appears to be a very difficult undertaking that is unlikely to get prioritized in the short term. Hence we need to be considering broadly what else can we do to alleviate the end-user problem here.
Exploring a few more options here:
The provider reached ~930MB now with release v6.59.1
.
It keeps steadily increasing in size with each release:
I'm testing some optimizations like compiling with the"-s -w"
ldflags.
This shaves of ~300MB and produces a 636MB binary.
With the v6.60.0 of the aws provider we reduced the size of the provider binary by 32% (from 932MB to 637MB).
The size of the compressed archive that's downloaded by pulumi was reduced by 58% (from 327MB to 137MB)
Comparing the AWS provider with the upstream terraform provider: 582MB (terraform 5.75.1) vs 637MB (pulumi - 6.60.0)
The Pulumi provider is ~50MB bigger, roughly 40 MB of that should be the schema. Looking into storing the schema in compressed form.
Storing the schema as a gzip shaves of another 25MB, resulting in a 612MB big provider. This is the prototype branch for compressing the schema: https://github.com/pulumi/pulumi-aws/tree/flostadler/compress-schema
A benchmark test reveals that this would add ~50ms on provider startup (on an M3 Macbook): https://github.com/pulumi/pulumi-aws/blob/9e6cad3485f0629528e83e1b7f691de5dc1dbfed/provider/cmd/pulumi-resource-aws/main_test.go#L9-L27
goos: darwin
goarch: arm64
pkg: github.com/pulumi/pulumi-aws/provider/v6/cmd/pulumi-resource-aws
cpu: Apple M3 Pro
BenchmarkDecompressSchema-12 20 50446285 ns/op
BenchmarkSchema-12 1000000000 0.3025 ns/op
I'll see if I can check the impact on less powerful hardware like CI runners.
Consider looking at options to reduce provider on-disk size.
Per a customer comment: it grow from 5.16 (~400MB) to 6.49 (~800MB) unpacked on disk.
Benefits of a leaner on-disk provider:
Possible culprits here:
embedded schema.json including more resources and examples for said resources, in more languages such as Java; could it be feasible to strip descriptions or at least examples from the schema distributed in the binary?
more embedded provider metadata, is there any compression that can be applied?
more Go dependencies statically linked in, anything that is possible to prune?