fmt: Should format commas in multi-line maps/objects

jdufresne commented 8 months ago

Terraform Version

Terraform v1.7.0
on linux_amd64

Use Cases

Sometimes people write maps with commas between entries, sometimes people write maps with a final trailing comma, and sometimes people write maps with no comas. The fmt command should harmonize all variations to a single canonical form to allow undesired style difference to be caught in CI. The terraform docs use no comma, so it seems like a good choice.

{
  name = "John",
  age  = 52
}

{
  name = "John",
  age  = 52,
}

Become:

{
  name = "John"
  age  = 52
}

Attempted Solutions

Document a style preference within a project.

Point out unwanted style differences in review.

Manually edit *.tf files.

Proposal

The fmt command removes commas from multi-line maps as styled in the terraform docs.

References

https://developer.hashicorp.com/terraform/language/expressions/types#maps-objects

apparentlymart commented 8 months ago

Thanks for suggesting this, @jdufresne! I agree that this sort of change is consistent with the goals of terraform fmt, and would be happy to see it implemented.

terraform fmt rules are a little tricky to specify, though: the formatter works at a lower semantic level than most other parts of Terraform, dealing mainly with individual tokens rather than higher-level abstract syntax structures like "object literal". Therefore I expect we'd need to find a way to describe this rule in terms of the data the formater has available to it.

You may have noticed that terraform fmt today does far more with the top-level structural objects like blocks and block arguments than it does with details inside expressions, and that is in large part because expression syntax is considerably more variable and so pattern-matching for formatting inside expressions is harder to get right, at least with the current capabilities of the underlying HCL API we rely on.

Doing this thoroughly will, I think, require some additional investment in the underlying hclwrite package to have it expose more information about the boundaries between parts of nested expressions, but we may still be able to achieve a partial implementation of the rule that would, to start, only work for the situation where an object constructor is directly assigned to an argument inside a block, and would not apply if the object constructor were wrapped inside some other kind of expression. We already have some similar rewriting logic for turning a naked "${foo}" into just foo here:

https://github.com/hashicorp/terraform/blob/d7f97ec847aafc100f60612ae82ec0952c9e7249/internal/command/fmt.go#L328-L423

Perhaps if we were to restructure that to first check the type of the first token in the expression, then we could run the currently-implemented logic if the first token is TokenOQuote, but run the new comma-removing rule if the first token is TokenOBrace.

Concretely, that means that the following would get "fixed":

  foo = {
    a = 1,
    b = 2,
  }

...but the following would not, because the formatter's analyzer is not yet sophisticated enough to detect that an object constructor is starting in the middle of this expression:

  foo = baz({
    a = 1,
    b = 2,
  )}

Even that might be tricky to achieve today though, because it seems to require the formatter to understand which commas are separating object constructor entries vs. other commas such as those in a function call argument list. My initial idea for dealing with it (as long as we're not changing hclwrite to have more expression-boundary-awareness) would be to count opening and closing "bracket-like" tokens in the expression and only remove commas that appear when there are no brackets open except the main braces for the object constructor.

(In HCL there are various "bracket tokens" that all follow the naming scheme TokenOSomething/TokenCSomething, and all current uses of commas in the language appear wrapped inside some sort of "brackets" to create the context for what the comma means, so I believe the above heuristic should work.)

apparentlymart commented 8 months ago

Ahh, right after I posted I realized a problem:

  foo = {
    for x, y in anything : whatever
  }

The comma after x here is within a pair of braces, but these braces are representing a for expression rather than an object constructor, and so it would be incorrect to do anything with that comma. So at the very least we'd need a special exception for when the TokenOBrace is followed by the keyword for. Maybe there are other examples of this. It might be better to just go directly to figuring out how to improve hclwrite to be able to report on the expression boundaries in a sequence of tokens.

hashicorp / terraform