hashicorp / hcl

HCL is the HashiCorp configuration language.
Mozilla Public License 2.0
5.28k stars 594 forks source link

json: Walk expressions like `hclsyntax.Walk` #543

Open wata727 opened 2 years ago

wata727 commented 2 years ago

In the context of static analysis, there is a use case where you want to visit all expressions in a JSON body. Imagine getting an expression for a string containing a reference to terraform.workspace from the following configuration:

{
  "resource": {
    "null_resource": {
      "a": {
        "triggers": {
          "w": "${terraform.workspace}"
        }
      }
    }
  }
}

If you know the schema of this configuration, you can get the exact "${terraform.workspace}" expression, but if you don't know the schema, it's a difficult problem. With the following way, an expression larger than the original expression will be obtained:

package main

import (
    "fmt"
    "os"

    "github.com/hashicorp/hcl/v2/json"
)

func main() {
    src, err := os.ReadFile("config.tf.json")
    if err != nil {
        panic(err)
    }
    file, diags := json.Parse(src, "config.tf.json")
    if diags.HasErrors() {
        panic(diags)
    }
    attributes, diags := file.Body.JustAttributes()
    if diags.HasErrors() {
        panic(diags)
    }
    fmt.Println(attributes["resource"].Expr.Range()) // => config.tf.json:2,15-10,4
}

The HCL native syntax provides the hclsyntax.Walk to solve the issue, but the json.Walk in the JSON syntax does not exist. Is it possible to provide such an API?

Proposal

If you add json.Walk, the interface will look like this:

type Walker interface {
    Enter(node json.node) hcl.Diagnostics
    Exit(node json.node) hcl.Diagnostics
}

func Walk(node json.node, w Walker) hcl.Diagnostics

In this case, the challenge is that json.node is private. I remember that exposing these structures was negative in previous discussions https://github.com/hashicorp/hcl/issues/332#issuecomment-570682405

My idea is to provide an interface that depends on the hcl.Body and hcl.Expression instead of json.node:

type ExpressionWalker interface {
    Enter(expr hcl.Expression) hcl.Diagnostics
    Exit(expr hcl.Expression) hcl.Diagnostics
}

func WalkExpressions(node hcl.Body, w ExpressionWalker) hcl.Diagnostics

In the above interface, there is a problem that you can pass an unintended body such as hclsyntax.Body, but in that case, it halts walking and reports diagnostics.

Refer to expr.Variables() for the behavior of walking all expressions. In this way, the expression is recursively created from the value and passed to the walker. In addition to this, it should be enough to consider the numberVal, booleanVal, and nullVal. https://github.com/hashicorp/hcl/blob/v2.13.0/json/structure.go#L513-L555

In addition to these, we need utilities to determine the type of expression. I believe this can provide interfaces like IsJSONObjectExpression,IsJSONArrayExpression like IsJSONExpression. https://github.com/hashicorp/hcl/blob/v2.13.0/json/is.go#L7-L31

How about the above proposal? If there is no problem, I would like to work with the implementation. Thank you.

References

apparentlymart commented 2 years ago

Hi @wata727! Thanks for sharing this proposal.

A common challenge with providing this kind of functionality against the JSON syntax is that the JSON syntax is always ambiguous until the caller provides a schema. That's why the API of this package is relatively opaque compared to hclsyntax: it's delaying most analyses until the last possible moment so that we have the most information about the caller's intent and can therefore interpret the expression in the most appropriate way.

I think there is one compromise we could potentially support here, though: it's required in the specification that when evaluating an expression in "full expression mode" a string is parsed as a string template in the native syntax, which in practice means using the parser and node types from the hclsyntax package. Therefore we could potentially provide a bridge from the json package to the hclsyntax package which would then allow using the hclsyntax walker to analyze the expression.

I think the minimum possible interface for this would be for the caller to explicitly choose between the two expression evaluation modes: literal-only mode or full expression mode. Once we know the mode we can in principle return a hclsyntax.Expression value whose evaluation behavior would be equivalent to evaluating the JSON package's expression directly. So perhaps something like the following:

It would be nice to also include the static analysis modes, where in particular the "static call" and "static traversal" modes cause the JSON syntax to treat strings as native syntax expressions rather than native syntax templates, but we don't necessarily need to solve this all at once.

With the benefit of hindsight, the json package in this repository could have been implemented in this way originally, with the hcl.Expression.Variables and hcl.Expression.Value methods then wrapping json.NativeSyntaxExpr before analyzing the result. I didn't consider that design at the time because we were focused primarily on the problem of direct evaluation through the main API, rather than enabling detailed analysis of the AST. If we were take this path I think we'd want to change the way the json package works to always use this translation method, rather than having two different implementations that could potentially diverge over time under maintenance. However, that does make this a potentially risky change to make because we'd need to ensure that the new implementation is completely compatible with the old.

Note that the API design I've described here intentionally only helps if you're already holding an hcl.Expression from the JSON package, typically because you've already used the hcl.Body API to decode the structure in terms of schema. The subset I suggested here can potentially work because the handling of expressions in the JSON syntax is relatively simple, relying only on choosing a particular evaluation mode.

I don't think it's viable to allow schemaless analysis of the body structure around the expressions because working successfully with that API would require the caller to reimplement all of the different possibilities from the Structural Elements part of the JSON syntax specification, which is non-trivial. HCL is intentionally schema-based and I don't consider it a priority to offer APIs to analyze structure without already knowing the schema of the data you plan to obtain.

Unfortunately I don't think I have the spare capacity to design, implement, or review a large change to the json package to support direct translation to hclsyntax.Expression at the moment, and so although I do think there is a potential path forward here I cannot promise to make any immediate progress on it. Even if the implementation were contributed by someone else, review and testing of this would be complex enough to require some significant attention and so I unfortunately cannot promise to be able to provide that attention for the foreseeable future, since my priorities are currently elsewhere (in Terraform, rather than in HCL).

wata727 commented 2 years ago

Thank you for telling me about the language design for this feature. I agree that it is better to provide an interface to the structure of JSON expression by converting it to the HCL native expression. I understand that maintaining two different mechanisms is hard and that it is difficult to change the way the current JSON syntax works.

For now, leave this issue open. I think it will be difficult to tackle this for a while, but I would like to share with other users that the discussion has stopped due to the above difficulties.

Fortunately, the problem we are currently facing is not yet important, so we will leave it as a future improvement. If the problem becomes apparent, it may be possible to fix it with a fork that adopted your proposal.

Thank you again for taking the time to this issue!