hashicorp / terraform-plugin-framework

A next-generation framework for building Terraform providers.
https://developer.hashicorp.com/terraform/plugin/framework
Mozilla Public License 2.0
304 stars 93 forks source link

Performance issue when run Terraform Plan with many Block Sets or NestedAttribute Sets #775

Open jeremmfr opened 1 year ago

jeremmfr commented 1 year ago

Module version

v1.2.0 & v1.3.1

Description

After migrating resources of my provider jeremmfr/junos from SDK plugin to this framework plugin, the time to run a terraform plan with a config that have many block sets has been increased very significantly.

For 1000 items of block set, a terraform plan (with empty state) with provider including module sdk/v2 take ~1.3 seconds and take ~ 70 seconds with provider including plugin-framework. A user of my provider has a time of 19 minutes to run terraform plan with the new version of the provider including plugin-framework instead of 52 seconds with the previous version of the provider including module sdk/v2

Tests

I tested to remove all PlanModifiers and Validators fields on each attribute, and ValidateConfig function for resources, using the latest version of this module (v1.3.1) but there's no change. The problem disappears if replace schema.SetNestedBlock with schema.ListNestedBlock: 70 seconds to 1 second.

I reproduce the problem with provider hashicups and hashicups_order resources. I tested 1000 blocksitems on a resource hashicups_order with different modifications of items block.

hashicorp/terraform-provider-hashicups schema.TypeSet : 3.8 - 4 seconds hashicorp/terraform-provider-hashicups schema.TypeList : 0.5 - 0.8 seconds hashicorp/terraform-provider-hashicups-pf schema.SetNestedBlock : 49.1 - 49.7 seconds hashicorp/terraform-provider-hashicups-pf schema.ListNestedBlock : 0.8 - 1.1 seconds hashicorp/terraform-provider-hashicups-pf schema.SetNestedAttribute : 49.0 - 49.4 seconds hashicorp/terraform-provider-hashicups-pf schema.ListNestedAttribute : 0.7 - 0.9 seconds

Relevant provider source code

To test with block sets, clone hashicorp/terraform-provider-hashicups-pf and apply this patch :

diff --git a/internal/provider/order_resource.go b/internal/provider/order_resource.go
index 4fe16cc..67203cd 100644
--- a/internal/provider/order_resource.go
+++ b/internal/provider/order_resource.go
@@ -76,10 +76,12 @@ func (r *orderResource) Schema(_ context.Context, _ resource.SchemaRequest, resp
                Description: "Timestamp of the last Terraform update of the order.",
                Computed:    true,
            },
-           "items": schema.ListNestedAttribute{
+       },
+       Blocks: map[string]schema.Block{
+           "items": schema.SetNestedBlock{
                Description: "List of items in the order.",
-               Required:    true,
-               NestedObject: schema.NestedAttributeObject{
+               //  Required:    true,
+               NestedObject: schema.NestedBlockObject{
                    Attributes: map[string]schema.Attribute{
                        "quantity": schema.Int64Attribute{
                            Description: "Count of this item in the order.",
@@ -87,7 +89,7 @@ func (r *orderResource) Schema(_ context.Context, _ resource.SchemaRequest, resp
                        },
                        "coffee": schema.SingleNestedAttribute{
                            Description: "Coffee item in the order.",
-                           Required:    true,
+                           Optional:    true,
                            Attributes: map[string]schema.Attribute{
                                "id": schema.Int64Attribute{
                                    Description: "Numeric identifier of the coffee.",

& disable hashicups.NewClient code in provider configure function

To test with nested atttribute sets, clone hashicorp/terraform-provider-hashicups-pf and apply this patch :

diff --git a/internal/provider/order_resource.go b/internal/provider/order_resource.go
index 4fe16cc..ea229a7 100644
--- a/internal/provider/order_resource.go
+++ b/internal/provider/order_resource.go
@@ -76,7 +76,7 @@ func (r *orderResource) Schema(_ context.Context, _ resource.SchemaRequest, resp
                Description: "Timestamp of the last Terraform update of the order.",
                Computed:    true,
            },
-           "items": schema.ListNestedAttribute{
+           "items": schema.SetNestedAttribute{
                Description: "List of items in the order.",
                Required:    true,
                NestedObject: schema.NestedAttributeObject{
@@ -87,7 +87,7 @@ func (r *orderResource) Schema(_ context.Context, _ resource.SchemaRequest, resp
                        },
                        "coffee": schema.SingleNestedAttribute{
                            Description: "Coffee item in the order.",
-                           Required:    true,
+                           Optional:    true,
                            Attributes: map[string]schema.Attribute{
                                "id": schema.Int64Attribute{
                                    Description: "Numeric identifier of the coffee.",

& disable hashicups.NewClient code in provider configure function

Terraform Configuration Files

To test with block sets:

resource "hashicups_order" "example" {
  dynamic "items" {
    for_each = range(1, 1000)

    content {
      quantity = items.value
    }
  }
}

To test with nested attribute sets:

resource "hashicups_order" "example2" {
  items = [
    for e in range(1, 1000) : {
      quantity = e
    }
  ]
}

Expected Behavior

The terraform plan command takes a reasonable time to run when there are many block sets.

Actual Behavior

The terraform plan command takes a very long time to run when there are many block sets.

Steps to Reproduce

  1. build hashicups provider
  2. terraform init
  3. terraform plan

References

jeremmfr/terraform-provider-junos#498

bflad commented 1 year ago

Hi @jeremmfr 👋 Thank you for reporting this issue and sorry for the trouble here. I just wanted to acknowledge that the maintainers have noticed this bug report and do plan on taking a further look into this, but that effort will probably happen early next week. In the meantime, if you have any familiarity with pprof and flamegraph profiling, that might be a good methodology for starting to narrow down where the unexpected extra time is being spent.

jeremmfr commented 1 year ago

Hi,

I've run profiling with pprof, the provider hashicups, attribute items with type schema.SetNestedBlock, and a resource hashicups_order with 1000 elements of dynamic items.

I've also updated all the dependencies for the provider hashicups :

 require (
    github.com/hashicorp-demoapp/hashicups-client-go v0.1.0
-   github.com/hashicorp/terraform-plugin-docs v0.14.1
-   github.com/hashicorp/terraform-plugin-framework v1.2.0
-   github.com/hashicorp/terraform-plugin-go v0.15.0
-   github.com/hashicorp/terraform-plugin-log v0.8.0
-   github.com/hashicorp/terraform-plugin-testing v1.2.0
+   github.com/hashicorp/terraform-plugin-docs v0.15.0
+   github.com/hashicorp/terraform-plugin-framework v1.3.1
+   github.com/hashicorp/terraform-plugin-go v0.16.0
+   github.com/hashicorp/terraform-plugin-log v0.9.0
+   github.com/hashicorp/terraform-plugin-testing v1.3.0
 )

I attach a png and a flamegrah (flamegraph in gist cpu.svg)

cpu

jeremmfr commented 1 year ago

A profiling result for a complete call of PlanResourceChange and ValidateResourceConfig instead of 30 seconds dump. (flamegraph in gist cpu2.svg)

cpu2

jeremmfr commented 1 year ago

A profiling result for a complete call of ValidateResourceConfig (~7 seconds) (flamegraph in gist cpu3.svg)

cpu3

bflad commented 1 year ago

Hi @jeremmfr 👋 Would you be able to try out https://github.com/hashicorp/terraform-plugin-go/pull/308? From my local testing with the hashicups-pf example, it reduced the plan time by over half. I'm curious what level of real world benefit you might see as well.

jeremmfr commented 1 year ago

Hi @bflad :wave:

We launch multiple scenarios to compare potential benefit.

The PR reduce time but there is always a gap between SDKv2 and framework version of provider.

The result:

in seconds SDKv2 Framework Framework with
terraform-plugin-go#308
hashicups_order
with 1000 items block
~ 3.8 ~ 48 ~ 14
junos_security_address_book
with 1000 network_address block
~ 1.3 ~ 78 ~ 17
junos_security_address_book
with 1000 address_set block
~ 3.2 ~ 198 ~ 45

The difference between network_address block and address_set block is that network_address block has only primitive attributes (String) and address_set block has primitive attributes (String) and primitive set attributes (String sets)

Terraform code - `hashicups_order` with `items` ```hcl resource "hashicups_order" "example" { dynamic "items" { for_each = range(1, 1000) content { quantity = items.value } } } ``` - `junos_security_address_book` with `network_address` ```hcl resource "junos_security_address_book" "global" { name = "global" dynamic "network_address" { for_each = range(1, 1000) content { name = "test${network_address.value}" value = "1.1.1.1/32" } } } ``` - `junos_security_address_book` with `address_set` ```hcl resource "junos_security_address_book" "global" { name = "global" dynamic "address_set" { for_each = range(1, 1000) content { name = "test${address_set.value}" address = ["baz"] } } } ```
Go replace in go.mod to use PR terraform-plugin-go#308 ```go replace github.com/hashicorp/terraform-plugin-go => github.com/hashicorp/terraform-plugin-go v0.17.1-0.20230630191346-2155f0b1810f ```

I run new profiling with pprof in last scenario (junos_security_address_book with 1000 address_set block and framework with terraform-plugin-go#308)

The result in png and flamegraph (in gist)

cpu4

bflad commented 1 year ago

Thanks for the confirmation and the additional context, @jeremmfr. 😄 Given the overall very positive effects, I've pulled in the upstream changes and will cut a release of those shortly.

I'm guessing there are two threads to pull on next with the framework side of things:

One other reality is that the framework went GA with its data handling half in terraform-plugin-go and half in terraform-plugin-framework. There is a lot of passing back and forth between the type systems in the PlanResourceChange logic. Ideally, everything would have been done in terraform-plugin-framework and only using terraform-plugin-go's type system for the request/response handling. It's unfortunately very difficult to change that until a major version of the framework to remove the directly exposed terraform-plugin-go types in certain places.

bflad commented 1 year ago

@jeremmfr if you have a moment, I'm curious if https://github.com/hashicorp/terraform-plugin-framework/commit/9fe2ca5dcf0da75f85d08930c2480d82b49ed1ae (which should include terraform-plugin-go#308) offers some slight benefits with your provider. I'm guessing it'll be fairly minimal, but it was low hanging fruit to prevent unnecessary memory allocation and cleanup work for the runtime.

jeremmfr commented 1 year ago

Hi @bflad , I ran the same previous scenarios with the PR #791, but the difference is not visible on the Plan time.

alexhung commented 1 year ago

@bflad Any update on this issue? Or any suggestion on workaround?