Allow provider maintainers to define identifying attributes which should always be shown in plan diffs

alisdair commented 2 years ago

Current Terraform Version

Terraform v1.1.7
on darwin_amd64

Use-cases

When Terraform is displaying a concise diff, unchanged attributes other than id, name, and tags are suppressed. In some cases, this can make the output difficult or impossible to understand, as the identifying data is unchanged and hidden.

Two examples from the "verbose plan" enhancement request:

The google_iam_policy data source binding blocks suppress the unique role attribute:

              ~ {
                  ~ members = [
                        "group:<group-name>",
                      - "serviceAccount:<service-account-name>",
                    ]
                    # (1 unchanged element hidden)
                },

The only unchanged element here is role, which is exactly what is needed to identify the meaning of this diff. In this case, binding is a nested block backed by a set.

aws_cloudfront_distribution changes are hard to understand as identifying attributes for each rule are suppressed:
```
[ ... ]
  ~ ordered_cache_behavior {
        # (12 unchanged attributes hidden)

      ~ forwarded_values {
            # (3 unchanged attributes hidden)

          ~ cookies {
              ~ forward           = "none" -> "all"
                # (1 unchanged attribute hidden)
            }
        }
    }
[ ... ]
```
In this case, ordered_cache_behavior is a nested block backed by a list, so Terraform is able to determine which elements have changed. It's not immediately clear which attributes ought to be displayed here, as I believe none are truly uniquely identifying, but my guess would be path_pattern, allowed_methods, and cached_methods.

References

27547 has a number of reports of the downstream effect of this issue, namely confusing concice plan diffs

apparentlymart commented 2 years ago

It's interesting to observe here that in practice the very attributes that tend to be useful for identification are also the ones that will rarely be changed in-place, because changing them could be considered as deleting an object and creating a new one, rather than an in-place edit.

Although our existing heuristic of just recognizing certain common names as identifying across all providers does get us most of the way there (as you mentioned), it does seem like providers need to be the decider here in order for the result to be useful in all cases.

It may well end up being a separate concern, but I think it would also be good to think about other ways Terraform could use the identifying attributes information other than just deciding which attributes to show when unchanged. For example, perhaps it would make sense to make the code that presents diffs for sets use explicitly-declared identifying attributes to infer when a particular change ought to be represented as an in-place update of an existing object, rather than removing one object and adding another in its place. I mention this here not because I want addressing that to be a blocker for resolving this issue, but just in case we might design the provider protocol features a little differently in the event we might extend it for this further improvement later.

crw commented 2 years ago

Closed https://github.com/hashicorp/terraform/issues/30811 as a dupe of this issue. The referenced issue contains a specific use case that would benefit from the feature described here.

bflad commented 2 years ago

Off-hand question: I'm wondering if something like this could help with terraform-plugin-sdk's set handling issues like https://github.com/hashicorp/terraform/issues/28281, which my (potentially incorrect) understanding is likely due to its internal set value hashes changing the ordering of things. I don't think that particular problem should occur in terraform-plugin-framework though since it implements set types as slices through and through.

EDIT: Okay, yeah, I'm just reiterating what Martin is mentioning above. 😄

apparentlymart commented 2 years ago

I think #28281 is also running into old SDK quirks where the protocol 4 shims are lossy and e.g. cause null values to get replaced by empty strings and other such quirkiness. We currently filter out that sort of noise at the leaves of the plan renderer (individual attribute values), but the higher-level logic which decides if e.g. an entire block is equal to its predecessor still gets tripped up by it.

Offering some way for a provider to hint that the plan renderer should use a particular set of attributes as a correlation key would allow Terraform to present those changes as ~ in-place edits rather than as separate - and + changes, but I don't think it would eliminate them entirely. The only thorough solution to that is for providers to adopt the new framework which doesn't have to go through the lossy shim layer and can thus provide new value whose only differences are really differences, not the effects of the lossiness.

So all of that is to say: I think there's a bit of both in #28281, but I wouldn't expect correlation hints to fully solve it. (We might at that point still choose to close #28281 anyway, since the new framework exists as our canonical full answer, albeit one that requires an incredibly heavy lift to adopt for existing providers.)

bflad commented 1 year ago

We are running into situations over in terraform-plugin-framework where the framework logic could use this information for its underlying planning and data handling logic. https://github.com/hashicorp/terraform-plugin-framework/issues/720 is a fairly trivial proposal (in terms of how its defined by provider developers anyways) that I think may be applicable for here is well. I'm wondering if there is any particular pitfalls or downsides to just naming the identifying attributes, or if additional information about these identifying attributes might be required for the Terraform side of this type of enhancement. There's still other design work required should this warrant being exposed across the protocol, but going down that road might be a welcome enhancement for practitioner experience since it could partially resolve at least some use cases in https://github.com/hashicorp/terraform/issues/27547 once provider developers opt into it.

EDIT: Some additional context about potential protocol updates. Theoretically, the framework enhancement could be exposed to Terraform via:

message Schema {
  message Block {
    # ... other fields ...
    repeated Attribute attributes = 2;
    repeated string identifying_attributes = 7;
  }

  message Object {
    # ... other fields ...
    repeated Attribute attributes = 1;
    repeated string identifying_attributes = 6;
  }

  # ... other messages/fields ... 
}

Then theoretically, Terraform could interrogate this information while rendering the plan. Block covers the case of the "top level" attributes in a schema, should it be important to always show other attributes, such as arn for AWS resources.

Otherwise, Attribute could have a bool identifying = 11; field added too, if walking all attributes in an "object" is better or desired from a design perspective.

apparentlymart commented 1 year ago

I think the main question that was left unanswered in earlier discussion was whether we want to hang any additional semantic meaning on the idea of "identifying attributes" beyond just that they are useful to show in the plan UI even when they haven't changed.

The linked issue https://github.com/hashicorp/terraform-plugin-framework/issues/720 proposes that indeed it would have the additional meaning of representing a custom unique identifier for a collection of objects, which I think is a reasonable goal but raises some new design questions we'd need to answer:

Are the identifying attributes required to be unique across all objects in the collection? Is it okay to treat violation of that rule in provider responses as a provider bug so we can safely rely on it being unique elsewhere in Terraform?
How will this interact with our mechanism for representing that two values that are non-equal by the definition of == are still equivalent from the API's perspective? Currently, PlanResourceChange can return the value from prior state instead of the value from the current configuration. Does that model still work and make sense if the attribute in question is an identifying attribute? In particular, what ought to happen if the configuration has two different values for different objects but they end up being treated as equivalent by the provider's planning logic?
It feels okay to do something like this for sets of objects -- for diffing purposes it behaves like a special map with compound keys made from the identifying attribute values -- but it feels less obviously okay for other collection types. For example, the ordering of elements in a list of objects is observable elsewhere in the language so a provider cannot just assert that a list of objects has identifying attributes and assume that makes it effectively unordered.
We'll need to define what's supposed to happen when the identifying attributes are unknown. So far we've largely avoided dealing with this because it's impossible for a list to have an unknown index/length and for a map to have an unknown key. Is it reasonable to say that if any identifying attribute contains an unknown value then Terraform will just reduce the entire collection to being unknown? (That is how we handle unknown list lengths and unknown map keys today).

I don't mean the above points as an argument against trying to solve this. Instead I'm just trying to think through what implications that change is likely to have for existing design assumptions that we might need to change to support a deeper meaning of "identifying attribute" than just the original request to show certain attributes even if they haven't changed.

bflad commented 1 year ago

Thank you for the deeper consideration, @apparentlymart, this is exactly the type of design questions I would hope would come out of this. At the end of the day, maybe its overly optimistic that a single field can capture the nuances here. Some initial, personal thoughts below --

Are the identifying attributes required to be unique across all objects in the collection? Is it okay to treat violation of that rule in provider responses as a provider bug so we can safely rely on it being unique elsewhere in Terraform?

There are certainly (at least) two choices here.

In the case that the answer is no, then I believe this information is more about providing "hints" to various logic (whether it be the framework trying to align prior state data or Terraform's plan renderer) about what attributes might be more "important" to consider. It would therefore be okay to take this information into account if it is known, but otherwise fallback to existing logic. Maybe "identifying" is the wrong terminology in this case.

In the case that the answer is yes, then I believe a lot more design consideration comes into play on both sides of the protocol. The further questions are certainly applicable for this choice. On the provider side, it should enforce this rule and provide specific recommendations to developers should this rule trigger a runtime implementation/validation error.

How will this interact with out mechanism for representing that two values that are non-equal by the definition of == are still equivalent from the API's perspective? Currently, PlanResourceChange can return the value from prior state instead of the value from the current configuration. Does that model still work and make sense if the attribute in question is an identifying attribute? In particular, what ought to happen if the configuration has two different values for different objects but they end up being treated as equivalent by the provider's planning logic?

Presumably, I think anyways, "identifying" attributes by nature would not be able to support the notion of "semantic" equality. It would be treated the same as changing the element key of a map. Some anti-patterns to this notion would then be cases where identifier-like values may take relative or absolute forms or where they may be case insensitive. Not supporting semantic equality there does solve some thorny problems, so it may be worth the explicit design choice.

It feels okay to do something like this for sets of objects -- for diffing purposes it behaves like a special map with compound keys made from the identifying attribute values -- but it feels less obviously okay for other collection types. For example, the ordering of elements in a list of objects is observable elsewhere in the language so a provider cannot just assert that a list of objects has identifying attributes and assume that makes it effectively unordered.

I think this touches on what is trying to be solved on the provider side though, while a list may be ordered, a complex key could be used to align prior state data with its new index. It is not about changing the semantics of how a list type operates, but more about aligning computed data during planning.

We'll need to define what's supposed to happen when the identifying attributes are unknown. So far we've largely avoided dealing with this because it's impossible for a list to have an unknown index/length and for a map to have an unknown key. Is it reasonable to say that if any identifying attribute contains an unknown value then Terraform will just reduce the entire collection to being unknown? (That is how we handle unknown list lengths and unknown map keys today).

This is a key (🥁 ) discussion point, going back to your first consideration question. If this information is just "hints" and may contain computed values, then the heuristic would mainly be for the purposes of plan rendering always showing certain information for objects. Otherwise, I think it must only support configurable and known planned values to avoid this situation. Terraform would have to defer validation until apply-time in certain cases to support this though.

Having gone though this response, it does feel a little less than ideal to overload multiple concepts here into a single field as they may require different values/nuances. If the answer here should be that there is explicit "always show these attributes as part of a plan" information in the protocol, then I think that's a separate implementation from what we are trying to accomplish on the provider side of realigning computed data for list and set elements where they are potentially rearranged.

atsalolikhin-spokeo commented 5 months ago

Is there any workaround in the meantime? I mean, how can I expose the hidden resources?

Roberdvs commented 1 month ago

Also here in relation to changes to aws_cloudfront_distribution being impossible to decipher:

hashicorp / terraform