hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io
Other
42.76k stars 9.56k forks source link

Consider Exposing Attribute Deprecation Messages in Machine-Readable Output #34541

Open bflad opened 10 months ago

bflad commented 10 months ago

Terraform Version

1.7.0

Use Cases

Currently, the Terraform Plugin Protocol implementation and terraform providers schema -json output expose schema attribute deprecation as a boolean true/false value. Tooling such as editor integrations and provider documentation generation relying on this information can only notify practitioners whether the attribute is deprecated or not, without context about what to do in those situations.

Example JSON today:

{
  "format_version": "1.0",
  "provider_schemas": {
    "ADDRESS": {
      // ...
      "resource_schemas": {
        "TYPE": {
          // ...
          "block": {
            "attributes": {
              "NAME": {
                // ...
                "deprecated": true,
              }
            },
          }
        }
      }
    }
  }
}

Attempted Solutions

For provider documentation generation, provider developers can copy the attribute deprecation message information into the attribute description. For practitioners using an editor integration, it would only show up when hovering over configured attributes if those editor integrations include the attribute description in that sort of user interface.

Another extremely high lift solution would be to skip using the Terraform Plugin Protocol and terraform providers schema -json output at all for these downstream use cases, instead relying on something like the provider binaries outputting their own machine-readable data. There are proposals for what this might look like, however it is still unclear whether it is a reasonable idea giving the ecosystem burden to implement it.

Proposal

I think ideally, we would take a look at this more holistically to support what editors call "code actions" as more structured data that gets encoded into providers and sent across the protocol, so editors and potentially other tooling can offer configuration refactoring/remediation support automatically. However, that effort requires some non-trivial discovery effort and needs to be prioritized accordingly.

Therefore this change is being pragmatically proposed since even with enhancements such as those:

So with that said, provider developers using both terraform-plugin-sdk (helper/schema.Schema.Deprecation) and terraform-plugin-framework (DeprecationMessage in each attribute type) already encode this information as a message string, which is included with the warning diagnostic details generated by both SDKs.

The proposal here would be to introduce a deprecation_message string field in the Terraform Plugin Protocol beside the existing deprecated boolean field to keep existing compatibility. We added this for function definitions recently. On the provider side of the protocol, the SDKs would take the existing provider-defined messages and thread that information through to the protocol data. On the core side, it would need to threaded through the provider RPC handlers and into the terraform providers schema -json output.

Example JSON proposed:

{
  "format_version": "1.0",
  "provider_schemas": {
    "ADDRESS": {
      // ...
      "resource_schemas": {
        "TYPE": {
          // ...
          "block": {
            "attributes": {
              "NAME": {
                // ...
                "deprecated": true,
                "deprecation_message": "Use X attribute instead.",
              }
            },
          }
        }
      }
    }
  }
}

There could also be consideration for exposing the schema level deprecation messaging as well resources and data sources, for similar reasons. The protocol does not expose this deprecation information at all, but all the provider side details are similar as attributes. This would let downstream documentation/editor tooling also provide this additional context for practitioners.

References

apparentlymart commented 10 months ago

Thanks for suggesting this!

Personally I feel that we're quickly reaching the practical limits on how much we can achieve by continuing to extend these ever-growing static JSON artifacts -- and in some cases, like the JSON plan output, I might argue that we're already past the practical limit, but thankfully that's not a concern for this particular issue.

I do agree that exposing this additional information somehow seems worth doing for the reasons stated, but I would also encourage considering a transition away from terraform providers schema -json towards live RPC functions offered by the still-very-new terraform rpcapi mechanism, which would then allow Terraform Core to stay running concurrently with callers like the language server and answer "smaller" questions like "what is the schema for these specific resource types?" as the need for that information gradually emerges, rather than dumping a single huge artifact out in one go.

If we were to adopt that model instead, I would feel considerably less concerned about each incremental addition. Today the schema artifact is already huge as soon as someone adds even just the hashicorp/aws provider in isolation, and so that creates some pressure to be picky about what level of detail to include in there.

We need to let the Terraform Stacks work settle before we add any further terraform rpcapi callers, because we're currently using Stacks as a realistic test case to gain experience and revise the design as needed, but I feel pretty confident in asserting that we can offer a function like what I've described above, and the uncertainty is only in the exact details of how we'd present it, since currently RPC API is just one big protobuf service with everything packed into it, and that's probably not going to scale well as we try to maintain support for multiple different use-cases with overlapping but different needs.

bflad commented 10 months ago

I agree with your sentiment regarding the size of the existing GetProviderSchema RPC response data and it definitely seems desirable to think about longer-term ways that tooling such as editor integrations could utilize the RPC API for more dynamically handling operations, however I'm not exactly sure what the next steps are here to move us towards enhancing downstream tooling with the additional data sourced from providers. It currently looks like the RPC API is designed as an interface between Terraform processes and provider-defined data is being served there as a "passthrough" of sorts from the core representation of that data fetched via the existing Terraform Plugin Protocol. Given that providers today only talk over the Terraform Plugin Protocol, it seems that in order to get additional data available to Terraform from providers either that existing protocol will need enhancements, we should start discussions on using a different protocol/service for providers, or we should start discussions on supporting multiple protocols/services for providers. There are of course other options, such as introducing other types of additional machine-readable data, but I feel fairly confident that there would not be any desire to design yet-more interfaces with Terraform.

In terms of enhancing the existing Terraform Plugin Protocol, over in https://github.com/hashicorp/terraform-proposals/issues/81, there was some musing on other protocol-related changes that would help separate both core and providers from needing to deal with the entire schema data at once. In particular, the "GetProviderSchema Limiting of Resource Types" and (I personally think preferable) "New GetXSchema RPCs" proposals. We didn't go down those routes at the time because it was mentioned that core relies on the entire schema data, so there was not a benefit to introducing RPCs that would not be called.

Is something like that, e.g. separate RPCs that are limited to individual resource types but expanded in the amount of data they return, a desirable approach? If not, could you help me understand what a more desirable solution would be?

apparentlymart commented 10 months ago

Creating an ability for systems like the language server to talk directly to provider plugins and thus support optional extra protocols that Terraform Core doesn't need is also a plausible idea! I think the main concern down that road in the past was that it would require those callers to essentially reimplement the provider plugin discovery logic that lives in Terraform CLI today, but I wonder if we could find a compromise where the rpcapi offers a way to discover and/or launch providers but then any subsequent communication is directly with the providers.

One thing I missed in my first read of this issue is that I was reading it as a request to expose some information that Terraform Core is already retrieving for its own purposes anyway, but with fresher brain I remember now that the logic for returning deprecation warnings lives in the SDK and Framework, and so this proposal also implies growing the provider plugin protocol schema model to include some data that Terraform Core would entirely ignore and Terraform CLI would only pass verbatim out into this JSON response. If that's true then I agree this seems like a prompt to revisit the idea of exposing an additional language server and documentation generation support protocol directly from providers, along with the new idea above of using rpcapi to help those callers to find and launch each provider plugin.

Exposing the subset of schema data that Terraform Core already needs anyway via a new rpcapi service could also potentially be useful, but I suppose we should wait to see what use-cases remain in a world where language server and docs tools would be using a specialized new protocol directly.

(Another user of schema information is Terraform Cloud, to allow it to correctly render resource plan/state data in its web UI, but in that case its needs are the same as Terraform CLI's -- it's a web equivalent of the CLI output, after all -- and we do tend to want to just snapshot the whole schema once and reuse it many times because Terraform Cloud can't keep a Terraform Core service running permanently to provide data gradually.)