hashicorp / terraform-plugin-framework

A next-generation framework for building Terraform providers.
https://developer.hashicorp.com/terraform/plugin/framework
Mozilla Public License 2.0
296 stars 91 forks source link

Missing NestingMode in GetSchema makes Terraform crash #326

Closed abergmeier closed 1 year ago

abergmeier commented 2 years ago

Module version

github.com/hashicorp/terraform-plugin-framework v0.8.0

Relevant provider source code


var (
    outputDockerAttributes = map[string]tfsdk.Attribute{}
    outputBlocks                             = map[string]tfsdk.Block{
        "docker": {
            Attributes: outputDockerAttributes,
            MaxItems:   1,
            PlanModifiers: tfsdk.AttributePlanModifiers{
                tfsdk.RequiresReplace(),
            },
        },
    }
)

func (t builtResourceType) GetSchema(ctx context.Context) (tfsdk.Schema, diag.Diagnostics) {
    return tfsdk.Schema{
        Blocks: map[string]tfsdk.Block{
            "output": {
                MinItems:            1,
                MaxItems:            1,
                Blocks:              outputBlocks,
                Description:         "Output destination. For entries of type gha, GitHub Action credentials are automatically added to attrs.",
                MarkdownDescription: "Output destination. For entries of `type` `gha`, GitHub Action credentials are automatically added to attrs.",
                PlanModifiers: tfsdk.AttributePlanModifiers{
                    tfsdk.RequiresReplace(),
                },
            },
        },
    }
}

Debug Output

See https://gist.github.com/abergmeier/fe55ad498453c30c85dde63aca2aa54b

Expected Behavior

framework should have validation in place to not allow "invalid" Schemas.

Actual Behavior

Make Terraform crash.

Steps to Reproduce

terraform apply

References

abergmeier commented 2 years ago

Introducing NestingMode: seems to fix the problem - so maybe validate whether NestingMode is unknown?

bflad commented 2 years ago

Hi @abergmeier 👋 Thank you for raising this and sorry you ran into trouble here.

Glad we have a lead on what caused the issue. We should also ensure there is a bug report raised in the Terraform CLI issue tracker, because they should likely raise a proper error instead of crashing. I'm guessing this is on the latest Terraform CLI version?

In terms of resolution here, we likely cannot raise a compiler error without making the schema code more complicated, however we do have plans in #113 to add provider unit testing functionality, similar to how you could get unit testing failures with terraform-plugin-sdk.

Do you also think the documentation could be improved anywhere to help avoid this? Your perspective would be super helpful. Thanks!

abergmeier commented 2 years ago

In terms of resolution here, we likely cannot raise a compiler error without making the schema code more complicated

Well the next best thing could be for the function calling GetSchema to recursively validate NestingMode.

Do you also think the documentation could be improved anywhere to help avoid this?

NestingMode has next to no docstring to begin with - so maybe would be good to describe the rules for when NestingMode is necessary.

Also maybe a better design would be for 0 to actually mean "Auto" and by default fallback to List, when NestingMode is needed?

In the end IMO it comes down how accessible you want the framework to be.

bflad commented 2 years ago

In the code that converts from the framework's schema to the protocol-level schema, there is already an error/validation check for NestedMode not being list or set:

https://github.com/hashicorp/terraform-plugin-framework/blob/e27fe65549ca6397d416ccda23ff65ad54f0c326/internal/toproto6/block.go#L35-L43

This should bubble up the error to here:

https://github.com/hashicorp/terraform-plugin-framework/blob/d020d18a7fd51ba1373ce9162ba7a16567cbfbed/internal/toproto6/schema.go#L36-L40

Which eventually winds up as an error diagnostic in the GetProviderSchema RPC response:

https://github.com/hashicorp/terraform-plugin-framework/blob/d020d18a7fd51ba1373ce9162ba7a16567cbfbed/internal/toproto6/getproviderschema.go#L64-L74

It doesn't appear that Terraform CLI is properly checking for error diagnostics to return early when it receives an error diagnostic:

https://github.com/hashicorp/terraform/blob/26ead07b68c106356b8d772fbeb36efd27949862/internal/plugin6/grpc_provider.go#L139-L145

So outside that, since Terraform CLI isn't properly handling nil on its side when it tries to convert the "incomplete" schemas without any nil checks here:

https://github.com/hashicorp/terraform/blob/26ead07b68c106356b8d772fbeb36efd27949862/internal/plugin6/convert/schema.go#L104

Then it seems we will need to always send back an "empty" block when there is an error, to prevent the crash for Terraform CLI versions without the error diagnostic and/or nil checks. It'd also be good to update the framework here to report back these diagnostics in the logs, so it is easier to troubleshoot that it at least tried to do the right thing by returning an error diagnostic.

bflad commented 2 years ago

Terraform CLI v1.2.3 released today will at least now correctly report the framework's error diagnostic, rather than panicking:

        Error: failed to read schema for blox_example.test in registry.terraform.io/hashicorp/blox: failed to retrieve schema from provider "registry.terraform.io/hashicorp/blox": Error converting resource schema: The schema for the resource "blox_example" couldn't be converted into a usable type. This is always a problem with the provider. Please report the following to the provider developer:

        AttributeName("yikes"): unrecognized nesting mode 0

maybe would be good to describe the rules for when NestingMode is necessary.

NestingMode is required for a valid block definition; Terraform supports an enumeration of them (list and set being the only two we want to support today). I will double check the Go documentation for the field and the top level Block type to ensure that it very explicitly states that setting the field is a requirement in the current implementation.

maybe a better design would be for 0 to actually mean "Auto" and by default fallback to List

We've intentionally shied away from automatic behaviors in terraform-plugin-framework where possible as they have tended to cause long term issues in the years of terraform-plugin-sdk maintenance and feature development. Given the above, we'd likely want to lean towards requiring provider developers to be explicit if there is an implementation choice to be made.

Another option rather than direct Block{} struct creation and expecting the NestingMode field to always be defined would be to instead require list/set block helper functions similar to nested attributes.

e.g. for nested attributes today they must be defined via:

Attributes: map[string]tfsdk.Attribute{
    "test_attr": {
        Attributes: tfsdk.ListNestedAttributes(map[string]tfsdk.Attribute{
            "nested_attr": {
                Type:     types.StringType,
                Required: true,
            },
        }),
        Required: true,
    },
},

A similar implementation for blocks could then be something like:

// Design sketch; this is not valid provider code today
Blocks: map[string]tfsdk.Block{
    "test_block": tfsdk.ListBlock(tfsdk.NestedBlock{
        Attributes: map[string]tfsdk.Attribute{
            "nested_attr": {
                Type:     types.StringType,
                Required: true,
            },
        },
        // No NestedMode field to set
    }),
},

Where the only valid Block implementations are created via ListBlock() and SetBlock() functions, which under the hood set NestingMode appropriately. We have some plans to rework some of the schema types, etc. in the future, so this is something we could consider as part that effort.

What do you think?

abergmeier commented 2 years ago

Thanks for taking the time to explain. I can agree that explicit is sometimes better than implicit.

What do you think?

Maybe I am just dumb but ListNestedAttributes always has me confused what the List is and what the Attribute and on which level. Especially given your example - it just confuses the living hell out of me. Since I personally strive for code that even is understandable when you have a bad day I would at least advice on renaming exported symbols like ListNestedAttributes and/or document the living hell out of them so it is very clear what you really meant.

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.