pulumi / pulumi-aws

An Amazon Web Services (AWS) Pulumi resource package, providing multi-language access to AWS
Apache License 2.0
466 stars 157 forks source link

Reduce provider binary on-disk size #4383

Open t0yv0 opened 3 months ago

t0yv0 commented 3 months ago

Consider looking at options to reduce provider on-disk size.

Per a customer comment: it grow from 5.16 (~400MB) to 6.49 (~800MB) unpacked on disk.

Benefits of a leaner on-disk provider:

Possible culprits here:

t0yv0 commented 3 months ago

Some information. Most of the size is present in the upstream provider build:

du -sh terraform-provider-aws                                                                                                                                                                     ~/code/terraform-provider-aws
752M    terraform-provider-aws
t0yv0 commented 3 months ago

From https://github.com/t0yv0/gobuildsize report on the upstream provider, major contributing packages are:

github.com/aws/aws-sdk-go-v2/service/ec2 137603662
github.com/hashicorp/terraform-provider-aws/internal/service/lexv2models 90502114
github.com/hashicorp/terraform-provider-aws/internal/service/batch 65336638
github.com/hashicorp/terraform-provider-aws/internal/service/ec2 60966940
github.com/hashicorp/terraform-provider-aws/internal/service/bedrockagent 59476792
github.com/aws/aws-sdk-go/service/sagemaker 56084368
github.com/aws/aws-sdk-go/service/quicksight 49658874
github.com/aws/aws-sdk-go-v2/service/iot 49391552
github.com/aws/aws-sdk-go-v2/service/glue 48909028
github.com/hashicorp/terraform-provider-aws/internal/service/securitylake 44860032
github.com/hashicorp/terraform-provider-aws/internal/service/cognitoidp 44200606
github.com/hashicorp/terraform-provider-aws/internal/service/securityhub 43308668
github.com/aws/aws-sdk-go-v2/service/rds 42986552
github.com/hashicorp/terraform-provider-aws/internal/service/verifiedpermissions 42005986
github.com/hashicorp/terraform-provider-aws/internal/service/appfabric 40250200
github.com/hashicorp/terraform-provider-aws/internal/service/rekognition 39744438
github.com/hashicorp/terraform-provider-aws/internal/service/ssmcontacts 39644536
github.com/hashicorp/terraform-provider-aws/internal/service/bedrock 38030686
github.com/hashicorp/terraform-provider-aws/internal/service/networkfirewall 37901482
github.com/hashicorp/terraform-provider-aws/internal/service/lakeformation 37305190
github.com/hashicorp/terraform-provider-aws/internal/service/medialive 37158322
github.com/hashicorp/terraform-provider-aws/internal/service/devopsguru 36337252
github.com/aws/aws-sdk-go-v2/service/chime 36253706
github.com/hashicorp/terraform-provider-aws/internal/service/cloudfront 35393416
github.com/hashicorp/terraform-provider-aws/internal/service/rds 34928184
github.com/aws/aws-sdk-go/service/connect 34686812
github.com/hashicorp/terraform-provider-aws/internal/service/elasticache 34474934
github.com/hashicorp/terraform-provider-aws/internal/service/s3 33639018
github.com/hashicorp/terraform-provider-aws/internal/service/bcmdataexports 33586502
github.com/aws/aws-sdk-go-v2/service/ssm 33467788
github.com/hashicorp/terraform-provider-aws/internal/service/timestreamwrite 33138974
github.com/aws/aws-sdk-go-v2/service/redshift 32826864
github.com/hashicorp/terraform-provider-aws/internal/service/appstream 32248636
github.com/hashicorp/terraform-provider-aws/internal/service/guardduty 32197636
github.com/hashicorp/terraform-provider-aws/internal/service/m2 31757684
github.com/aws/aws-sdk-go-v2/service/securityhub 31605124
github.com/hashicorp/terraform-provider-aws/internal/service/resourceexplorer2 31479248
github.com/hashicorp/terraform-provider-aws/internal/service/osis 31141590
github.com/hashicorp/terraform-provider-aws/internal/service/redshift 31092528
github.com/hashicorp/terraform-provider-aws/internal/service/s3control 30801928
github.com/hashicorp/terraform-provider-aws/internal/service/amp 30617456
github.com/aws/aws-sdk-go-v2/service/iam 30602038
ringods commented 3 months ago

@t0yv0 additional feedback from the customer about the impact of the plugin binary file size growth:

It is mainly two things:

  • download times
  • space on disk, hat to delete 50GB of plugins from disk yesterday
t0yv0 commented 3 months ago

This makes sense Ringo, thanks for that detail. With 50GB of plugins, I am wondering if something can be done at the plugin cache level, some form of scheduled eviction, as it appears multiple copies of provider(s) are involved there.

tobiashenkel commented 3 months ago

This makes sense Ringo, thanks for that detail. With 50GB of plugins, I am wondering if something can be done at the plugin cache level, some form of scheduled eviction, as it appears multiple copies of provider(s) are involved there.

A bit of context there. This grew over the course of roughly half a year while maintaining 50+ pulumi stacks each with its own repo with its own dependencies which are regularly updated in a semi automated way.

t0yv0 commented 3 months ago

I think there's a feature request in pulumi/pulumi that could be helpful in a situation like this: https://github.com/pulumi/pulumi/issues/7505

I will cross-link and add some ideas there.

We would love to reduce the binary size of pulumi-aws but since it appears to be dominated by terraform-provider-aws binary size it appears to be a very difficult undertaking that is unlikely to get prioritized in the short term. Hence we need to be considering broadly what else can we do to alleviate the end-user problem here.

t0yv0 commented 1 month ago

Exploring a few more options here:

flostadler commented 1 week ago

The provider reached ~930MB now with release v6.59.1.

It keeps steadily increasing in size with each release: Image

flostadler commented 1 week ago

I'm testing some optimizations like compiling with the"-s -w" ldflags. This shaves of ~300MB and produces a 636MB binary.

flostadler commented 1 week ago

With the v6.60.0 of the aws provider we reduced the size of the provider binary by 32% (from 932MB to 637MB).

The size of the compressed archive that's downloaded by pulumi was reduced by 58% (from 327MB to 137MB)

flostadler commented 1 week ago

Comparing the AWS provider with the upstream terraform provider: 582MB (terraform 5.75.1) vs 637MB (pulumi - 6.60.0)

The Pulumi provider is ~50MB bigger, roughly 40 MB of that should be the schema. Looking into storing the schema in compressed form.

flostadler commented 1 week ago

Storing the schema as a gzip shaves of another 25MB, resulting in a 612MB big provider. This is the prototype branch for compressing the schema: https://github.com/pulumi/pulumi-aws/tree/flostadler/compress-schema

A benchmark test reveals that this would add ~50ms on provider startup (on an M3 Macbook): https://github.com/pulumi/pulumi-aws/blob/9e6cad3485f0629528e83e1b7f691de5dc1dbfed/provider/cmd/pulumi-resource-aws/main_test.go#L9-L27

goos: darwin
goarch: arm64
pkg: github.com/pulumi/pulumi-aws/provider/v6/cmd/pulumi-resource-aws
cpu: Apple M3 Pro
BenchmarkDecompressSchema-12              20      50446285 ns/op
BenchmarkSchema-12                  1000000000           0.3025 ns/op

I'll see if I can check the impact on less powerful hardware like CI runners.