hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.4k stars 9.5k forks source link

Consider Always Including Diagnostic Path Information in UI #31575

Open bflad opened 2 years ago

bflad commented 2 years ago

Current Terraform Version

Terraform v1.2.6
on darwin_arm64

Use-cases

The plugin protocol currently has the following definition for diagnostics:

message Diagnostic {
    enum Severity {
        INVALID = 0;
        ERROR = 1;
        WARNING = 2;
    }
    Severity severity = 1;
    string summary = 2;
    string detail = 3;
    AttributePath attribute = 4;
}

Providers developed with the both terraform-plugin-sdk and terraform-plugin-framework will generally populate the attribute field with a valid schema path relating to the cause of a diagnostic. These diagnostics may relate to configuration validation issues, provider data handling issues, or other provider-specific causes.

When the diagnostic relates to a schema path where the practitioner has happened to configure a value, the Terraform user interface will show the configuration source file and line, where that value is configured.

In this example, the provider developer using terraform-plugin-framework has indicated they would want a string value between 1 and 3 characters using what we call a validator (similar to terraform-plugin-sdk ValidateFunc):

"required_attribute": {
    Required: true,
    Type:     types.StringType,
    Validators: []tfsdk.AttributeValidator{
        stringvalidator.LengthBetween(1, 3),
    },
},

Given this configuration:

resource "diagpath_example" "example" {
  required_attribute = "too long"
}

It causes this Terraform UI output (while not shown after copying here, the terminal UI also nicely underlines the "too long" value):

╷
│ Error: Invalid Attribute Value Length
│ 
│   with diagpath_example.example,
│   on resource.tf line 11, in resource "diagpath_example" "example":
│   11:   required_attribute = "too long"
│ 
│ String length must be between 1 and 3, got: 8

This is awesome! Without much provider-side (SDK or provider logic) effort, this error diagnostic is fairly clear about the problem and where it occurred. Diagnostics are currently less helpful if there is no configuration though.

In this example, the provider developer using terraform-plugin-framework has indicated they want a string value to be required in a configuration:

"required_attribute": {
    Required: true,
    Type:     types.StringType,
},

Given this configuration:

resource "diagpath_example" "example" {}

It causes this Terraform UI output:

│ Error: Missing required argument
│ 
│   on resource.tf line 9, in resource "diagpath_example" "example":
│    9: resource "diagpath_example" "example" {
│ 
│ The argument "required_attribute" is required, but no definition was found.

The diagnostic UI containing the configuration source does not ever mention the schema path that was passed over the protocol. The diagnostic details including the path happen to be an implementation detail caught by the terraform-plugin-framework developers when developing the diagnostic itself, which is error prone. We can see this in action with a slightly different validation situation where that detail was missed.

In this example, the provider developer using terraform-plugin-framework has indicated they want at least 1 configuration block (refer also to https://github.com/hashicorp/terraform-plugin-framework/issues/437, which discusses the potential removal of block MaxItems/MinItems):

"required_block": {
    // other fields omitted for brevity
    NestingMode: tfsdk.BlockNestingModeList,
    Validators: []tfsdk.AttributeValidator{
        listvalidator.SizeAtLeast(1),
    },
},

Given this configuration:

resource "diagpath_example" "example" {}

It causes this Terraform UI output:

│ Error: Invalid Attribute Value
│ 
│   with diagpath_example.example,
│   on resource.tf line 9, in resource "diagpath_example" "example":
│    9: resource "diagpath_example" "example" {
│ 
│ List must contain at least 1 elements, got: 0

So in this case it would result in a confused practitioner creating a provider bug, which provider or SDK developers must be cognizant of this particular diagnostic UI quirk to know that the fix is to add some form of path details to the diagnostic. The same problem happens outside configurable attributes as well.

In this example, the provider developer using terraform-plugin-framework has setup a data source with a computed id attribute, but just always returns a path-based diagnostic for that value:

func (d exampleDataSource) Read(ctx context.Context, req tfsdk.ReadDataSourceRequest, resp *tfsdk.ReadDataSourceResponse) {
    resp.Diagnostics.AddAttributeError(
        path.Root("id"),
        "Invalid Computed Attribute Value",
        "Example details.",
    )
}

Given this configuration:

data "diagpath_example" "example" {}

It causes this Terraform UI output:

data.diagpath_example.example: Reading...
╷
│ Error: Invalid Computed Attribute Value
│ 
│   with data.diagpath_example.example,
│   on data-source.tf line 9, in data "diagpath_example" "example":
│    9: data "diagpath_example" "example" {}
│ 
│ Example details.

There are other potentially more complex use cases, such as DynamicPseudoType based schemas not showing proper configuration source details (a provider developer more familiar with that particular issue can chime in), but this hopefully gives the general idea.

Attempted Solutions

Manually including path information in diagnostic details. Documenting to all provider and SDK developers that the path information should be included in the details, or it may confuse practitioners even when the path information is sent via the protocol field.

Proposal

Theoretically, the configuration source bits could just always add a line in the output:

│ SEVERITY: SUMMARY
│ 
│   with ADDRESS,
|   and path PATH, <---------- this (or something like it)
│   on FILE line LINE, in BLOCK:
│    BLOCK_LINE: BLOCK
│ 
│ DETAILS

The line naming/verbiage is arbitrarily picked here; the important detail is that this information is shown in some fashion. It could however be fancier and only show the path information when there is no configuration value.

References

jbardin commented 2 years ago

Thanks @bflad!

I think we already to collect all the information in the diagnostic, so the issue here may just be a rendering problem. Whenever the provider diagnostic contains an attribute path, we create a special attributeDiagnostic which can correlate that path within a given configuration. I think it makes sense to try and render the path in some way if there is no corresponding config location.

apparentlymart commented 2 years ago

I think there's an interesting tension here which we'll need to navigate.

In the design so far we leaned towards making the prose messages be the primary source of information so as to give us the maximum flexibility in evolving the messages over time and tailoring them to specific situations, without having to change the wire protocol or Terraform Core's internal architecture for diagnostics.

However, we have gradually been piling on more and more "automatic" bits of context into the messages. At the current time we can potentially include:

Based on my experiences helping folks in various community forums it's seeming to me like it isn't clearly better to include more information in a message. If we include any information that is either redundant or not immediately relevant to the problem at hand, it seems that some readers will only skim the message and latch on to whatever they notice first that seems relevant, even if there's something that describes the problem and solution even better elsewhere in the messages. This concern was what motivated my recent change in #31299, for example.

I don't mean this to say that including the path information is necessarily redundant or irrelevant, but I think we should consider it carefully and not just do it because in principle we could do it. If we do decide to do it, we should also think carefully about how to include it so that it hopefully complements the existing presentation well, rather than adding further clutter to it. (I think it's already on the cusp of being a little too cluttered.)

With all of that said, I can certainly see how this would be valuable in helping folks make sense of lower-quality provider-generated messages that miss key information from the description. I would suggest that we also consider the possibility of making the SDK generate additional context when it's relevant, since the SDK in principle has more information to work with than Terraform Core does to decide relevance.

If we do decide to move ahead with it, my proposal for a low-clutter way to include it in the error message would be to combine it into the existing "with" clause so that we would be describing both the resource instance address and the path together in the same "field" in the UI:

│ Error: Invalid Attribute Value
│ 
│   with diagpath_example.example[1].required_attribute,
│   on resource.tf line 9, in resource "diagpath_example" "example":
│    9: resource "diagpath_example" "example" {
│ 
│ List must contain at least 1 elements, got: 0

This is admittedly trickier for situations where the invalid attribute is in something other than a resource instance where we might not have an expression-shaped "with" address to concatenate it onto. But problems in resource instances feel like by far the most common case to me, so I'd prefer to optimize for that being compact even if we do end up having a less-concise variant for situations where this wouldn't work so well. (I believe right now this concern is theoretical because resource instance addresses are the only case where we capture the "with" address, but I may be misremembering.)

jbardin commented 2 years ago

Thanks @apparentlymart! I actually prototyped the change and came up with the exact same output as a compromise. Trying to organize even more conditional information into that interface impacts a lot of code and makes supporting it more difficult, so using the with ... field for this looks like a good solution.

The other option here, which may have been the original intent within the protocol, is that the provider would add enough context to the error for the user to locate the problem, something like

│ Error: Invalid Attribute Value: "nested.object.required_attribute"
│ 
│   with diagpath_example.example[1],
│   on resource.tf line 9, in resource "diagpath_example" "example":
│    9: resource "diagpath_example" "example" {
│ 
│ List must contain at least 1 elements, got: 0

I see the case however that the SDK is going through the trouble to add an attribute path to the diagnostic, so it seems reasonable that information not be thrown out of even if there is no matching configuration.

apparentlymart commented 2 years ago

Indeed, while my original intention had been that the path information was there to substitute only for the source location information (since the provider can't "see" the source code in order to return source location information directly), I do quite like how this seems to neatly solve the edge case where we're talking about a thing which doesn't exist in the source code at all and therefore has no suitable source location information to return.

HCL itself has a similar problem with the messages it generates as part of its early decoding work. Although we certainly don't need to block on solving that, I'd suggest (if you haven't already) thinking about what it might look like for HCL to emit some comparable metadata from e.g. hcldec and hcl.Body.Content when it returns the "Missing required argument" error at the HCL layer. (I think the "Missing required argument" message included in the question is an example of one of these HCL-source errors, rather than something generated inside Terraform itself.)