Performance issues with blocktoattr.ExpandedVariables and Block.DecoderSpec

jbardin commented 4 years ago

Resources with large, deeply nested schemas can cause blocktoattr.ExpandedVariables to take a relatively large amount of time to execute.

For example, the aws_wafv2_web_acl which has a combined total of 2878 blocks at the moment, takes many seconds to process the potential references from the configuration.

There are 2 pieces that may need to be optimized here:

The blocktoattr.ExpandedVariables calls schema.DecoderSpec() for each block as it recurses through the schema, but schema.DecoderSpec() is also recursive and re-evaluates each child block even though the result is only used for the current level of recursion.
The schema.DecoderSpec() call itself is relatively slow, taking ~1s for the top-level call on aws_wafv2_web_acl with my test machine.

apparentlymart commented 4 years ago

I did some initial investigation of time taken with some different permutations of the blocktoattr fixup process:

Fixup actually Needed?	Checked if Fixup was needed?	Applied fixup?	Ops	Time
no	no	no	325,104	32,390 ns/op
no	no	yes	324,705	35,129 ns/op
no	yes	no	305,821	36,179 ns/op
yes	no	no	(failure)	(failure)
yes	no	yes	734,187	15,884 ns/op
yes	yes	yes	614,545	17,376 ns/op

To get these I generated two schemas representing nested data structures with four nesting levels and a fan-out of two nested objects per level. The schema that didn't need fixup was built from nested blocks, while the schema that needs fixup was built with nested attributes of a list-of-objects type.

"Checked if Fixup was needed?" represents a new pre-check function I added, to see if it might be worthwhile to do a little checking work up front for the potential of skipping the "real work" if we can prove it unnecessary:

func hasAnyAmbiguousAttrs(schema *configschema.Block) bool {
    if schema == nil {
        return false
    }
    for _, attrS := range schema.Attributes {
        aty := attrS.Type
        if isAmbiguousAttrType(aty) {
            return true
        }
    }
    for _, blockS := range schema.BlockTypes {
        if hasAnyAmbiguousAttrs(&blockS.Block) {
            return true
        }
    }
    return false
}

I also tested a situation where I just directly decoded a body into the schema without checking whether the fixup was needed or even applying the fixup. Of course, that caused a failure in the case where a fixup was needed, so that scenario is marked as "(failure)" in the table above.

My initial conclusion from the above is that it's not productive to use the new function above to skip the fixup decoding. Perhaps there are different formulations of that approach that would work better, but I'm not sure what other variants of that approach to try that wouldn't require some quite significant refactoring.

I also thought it was quite interesting to see how much slower it is to decode nested blocks than nested lists of objects in an attribute, even in cases where we don't apply the fix-up logic. Skipping fixup altogether (without the conditional check) did seem to make a good difference compared to the other cases, but that isn't practical because we need to handle the ambiguous case too. It might be interesting to understand why nested block decoding is so much more expensive than decoding an attribute with an equivalent nested data structure. (This might be related to the slow schema.DecoderSpec calls @jbardin saw, because DecoderSpec takes a different codepath for nested blocks than it does for attributes and has to do quite a bit more work; attribute types are used as-is, but block decoding specs need to be constructed piecemeal.)

I've run out of time to investigate this further today. I think the next step, as @jbardin suggested, is to see if there are opportunities to address this by improving the performance of schema.DecoderSpec. In case it's useful for ongoing investigation or for verifying my results above, I've pushed up the changes I used to gather the data in 83445022c2774a748c412ac591a3b96cff7e79a1.

apparentlymart commented 4 years ago

I think while we're here it's also worth considering whether a resource type schema with 2878 nested blocks is actually reasonable. I see in this case it's an attempt to "fake" an infinite level of nesting by just generating a huge finite schema, but that is clearly a workaround for a mismatch between Terraform's model of the world and the remote API's model of the world and so it may be better for us to address the root problem by finding a better way to translate that underlying API to Terraform concepts, which may or may not require adding some new concepts to Terraform.

Looking a bit deeper into the underlying AWS API that motivated this (WAF ACLs) I see something that seems conceptually pretty similar to IAM policy documents, which we model in the AWS provider as just strings containing either JSON or YAML, which the provider then just passes on verbatim to the underlying API. However, the WAF API doesn't seem to be built that way: it expects the ACL data structure to be specified directly in the Rules argument of CreateWebACL, and so there isn't any standard text-based serialization of the format like there is for IAM policies, and thus we've effectively had to define our own Terraform-flavored "language" for defining WAF ACLs, using nested blocks.

This seems like a good motivating example for the ongoing discussion of how best to represent complex data structures in a Terraform provider. In principle the provider could expose this as a single attribute called rules whose type constraint is any, and then do all of the processing (including any recursive translation) in the provider codebase. However, that's a pretty significant change from how that resource type is currently specified and, I believe, isn't technically possible in the current official plugin SDK. (The kubernetes-alpha provider is currently achieving that by bypassing the SDK altogether and implementing directly against the provider protocol, but that's not feasible for the AWS provider.)

apparentlymart commented 4 years ago

I had a little more time this morning so I tried a second small experiment: I made Block.DecoderSpec memoize its result.

However, that didn't produce any significant improvement on the previous benchmarks, which is not surprising in retrospect because part of the work the fixup does is to construct a synthetic configschema.Block representing a block definition of similar shape to the underlying attribute, and that block is used once and then immediately discarded, giving no opportunity to memoize.

I again ran out of time so I have not investigated any further, but I thought I'd mention that here for the benefit of future me or anyone else who takes a look at this more in the future.

philnichol commented 4 years ago

Apologies in advance if I've missed something obvious, and I'm aware this definitely isn't a very elegant solution (very new to go & tf). But could we not have something like the following to prevent checking all nested blocks twice as @jbardin mentioned? Testing locally this reduced the time taken to plan/apply/destroy WAFv2 resources by between 30-40%. Only tested WAFv2 so not sure if this would break anything else.

// configs/configschema/decoder_spec.go
func (b *Block) DecoderSpecNoRecurse() hcldec.Spec {
    ret := hcldec.ObjectSpec{}
    if b == nil {
        return ret
    }

    for name, attrS := range b.Attributes {
        ret[name] = attrS.decoderSpec(name)
    }

    for name, _ := range b.BlockTypes {
        if _, exists := ret[name]; exists {
            // This indicates an invalid schema, since it's not valid to
            // define both an attribute and a block type of the same name.
            // However, we don't raise this here since it's checked by
            // InternalValidate.
            continue
        }
    }

    return ret
}

// lang/blocktoattr/variables.go
func walkVariables(node dynblock.WalkVariablesNode, body hcl.Body, schema *configschema.Block) []hcl.Traversal {
    givenRawSchema := hcldec.ImpliedSchema(schema.DecoderSpecNoRecurse())
...

jbardin commented 4 years ago

Thanks @philnichol,

That's not a bad idea, but I do think we need to take the blocks into account when generating the spec.

Since the blocktoattr fixup code itself represents an edge case meant to workaround inadequacies in the legacy SDK, and the code is not heavily used by the internal codebase, I'd rather not risk changing the overall structure of what is going on here.

What I do have is some promising results with simple memoization (not sure what was different from @apparentlymart's attempt earlier, but a full plan with aws_wafv2_web_acl appears to run 98% faster now. I'll follow up with a PR if this proves successful.

ghost commented 3 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hashicorp / terraform

Performance issues with blocktoattr.ExpandedVariables and Block.DecoderSpec #25889