feat: Add a --fix option to auto fix small issues like eslint

nitrocode commented 5 years ago

For example, we could fix

unused variables/locals
alphabetize in general
- https://github.com/AlexNabokikh/tfsort
alphabetize aws_iam_policy_document's actions
convert aws service roles into data sources
convert iam policy json into data source
- this golang project can be used https://github.com/flosell/iam-policy-json-to-terraform
convert JSON into a local variable and use jsonencode() to dump
replace // comments with #
im sure there are many other cases

bendrucker commented 4 years ago

I'm potentially interested in this, quite a bit of ground work involved in making this possible though.

hclwrite handles edits like this. I thought about it with #762 since it's extremely easy to auto-fix.

However, it does strike me that idea of a directory as a module in Terraform as opposed to a file as a module in JS does seem to make this a bunch harder. ESLint can build an AST for a file, let each rule mutate it, and then write the final result back to the file. TFLint rules (via the runner) work at a module level, and only recently have been able to perform raw HCL decoding (#745). All of this is based on the idea that the configuration is immutable.

Once a write is performed, the representation TFLint has of the module is invalid and everything has to be reloaded from disk. That's a bit easier if we're willing to write changes after each rule, but that's really not ideal as it would tend to close off a --fix --dry-run mode that would print all proposed changes. So at that point we're looking at mutating an in-memory filesystem and writing at the end.

So as much as I'd be interested I'm not sure I'll have the time to devote to getting this off the ground.

wata727 commented 1 year ago

I'm currently looking into ways to introduce autofix. Here are some proofs of concept.

I would like to share the design I have in mind at the moment.

The first thing that bothered me was how to design the SDK API. The most appropriate way to process HCL files is to expose the hclwrite API to plugin developers. However, the abstraction level of hclwrite is different from hclsyntax, so there is an issue that they cannot be converted to each other.

It operates at a different level of abstraction than the main HCL parser and AST, since details such as the placement of comments and newlines are preserved when unchanged.

https://pkg.go.dev/github.com/hashicorp/hcl/v2@v2.16.2/hclwrite

However, since plugin developers determine whether there is an issue based on the hclsyntax AST, in order to provide the hclwrite API, the developers must implement the Check logic for both the hclsyntax and hclwrite ASTs. This is clearly redundant and not a good design.

The solution to this issue is to provide a direct file editing API based on the hcl.Range byte indexes, without using hclwrite API. Since the target hcl.Range to be modified is already known, partial rewriting can be achieved by using these byte indexes. Imagine an implementation like this:

type Fixer struct {
  sources map[string][]byte
}

func (f *Fixer) Replace(rng hcl.Range, replaced string) {
  file := f.sources[rng.Filename]

  buf := bytes.NewBuffer(file[:rng.Start.Byte])
  buf.WriteString(replaced)
  buf.Write(file[rng.End.Byte:])

  f.sources[rng.Filename] = buf.Bytes()
}

Other APIs such as text deletion and addition can be implemented in the Fixer. The ESLint API will be helpful for this.

insertTextAfterRange(range, text)
insertTextBeforeRange(range, text)
removeRange(range)

https://eslint.org/docs/latest/extend/custom-rules#applying-fixes

It's difficult to easily implement advanced APIs like those provided by hclwrite (e.g. RenameVariablePrefix, operations to add attributes while preserving indentation, etc.), but it's possible.

The Fixer makes it available as a callback argument for the EmitIssue extension API. Imagine something like this:

runner.EmitIssueWithFix(
  rule,
  "Single line comments should begin with #",
  token.Range,
  func (f *Fixer) error { return f.Replace(rangeForPrefix, "#") }
)

By providing it as part of EmitIssue, you can avoid applying autofixes to issues ignored by tflint-ignore annotations.

The modified source code is sent from the plugin to the host over gRPC. These are stored in memory as a field of the Runner. Once the changes have been applied, rebuild terraform.Module based on the new files. Add a new API based on the build API. https://github.com/terraform-linters/tflint/blob/9b802b15a40df971fc9e92994f20575028c65986/terraform/module.go#L43-L74

Rebuilding terraform.Module allows subsequent calls from plugins (e.g. GetModuleContent) to return modified module content. This allows you to accumulate autofixes for each rule and generate the final fixed source code.

However, if we take this approach, we lose the possibility of parallel execution. At the moment, the calls from the plugin are executed serially, so this way is not a problem, but if we want parallel execution for performance reasons, we may need to reconsider this approach.

Also, if a module call is added by autofix, module inspection is not performed on it. This is an implementation limitation, but I believe it's a very rare case and shouldn't be a big concern.

The accumulated changes are finally written out to the filesystem after outputting issues. At this time, there is room for discussion as to which issue should be output. ESLint does not output autofixed problems, only the remaining problems.

The fixes are made to the actual files themselves and only the remaining unfixed issues are output.

https://eslint.org/docs/latest/use/command-line-interface#fix-problems

On the other hand, RuboCop, a famous linter for Ruby, also outputs fixed offenses. https://docs.rubocop.org/rubocop/usage/autocorrect.html

ESLint's behavior is nice to use with pre-commit, while RuboCop is better for confirming which issues were actually fixed. Personally, I think RuboCop's behavior is better.

However, there is a problem with what point in time the output of the fixed issue refers to the source code. For example, imagine code like this:

Revision 1

variable "unused" {}
// comment

Suppose the autofix by terraform_unused_declaration is applied.

Revision 2

// comment

Suppose the autofix by terraform_comment_syntax is applied.

Revision 3

# comment

For the terraform_unused_declaration issue, refer to revision 1, but for the terraform_comment_syntax issue, refer to revision 2 (Hint: the terraform_comment_syntax issue's range points to line 1). This means that in some cases the modified source code must be referenced in the output.

Finally, we also have to think about conflicting autofixes. Consider the following example:

// comment

Suppose the autofix by terraform_comment_syntax is applied.

# comment

Suppose an autofix that adds resource definitions with a comment is applied.

# comment

// This is added by autofix
resource "aws_instance" "foo" {}

Note that the comment added by the second rule does not have the terraform_comment_syntax autofix applied. To avoid such problems, if an autofix changes a source code, we should apply the autofix repeatedly until there are no more changes. This is the same approach RuboCop do it. https://github.com/rubocop/rubocop/blob/efdd3f58a58bdbfcf2012a75fd53f1aa907e29ad/lib/rubocop/runner.rb#L283

EDIT: ESLint also uses this strategy. https://github.com/eslint/eslint/blob/f5574dc739fcc74a7841217ba1f31cce02bee1ff/lib/linter/linter.js#L1949-L1984

bendrucker commented 1 year ago

However, if we take this approach, we lose the possibility of parallel execution. At the moment, the calls from the plugin are executed serially, so this way is not a problem, but if we want parallel execution for performance reasons, we may need to reconsider this approach.

Providing multiple methods (e.g., Check and Fix) could work, or some other interface-oriented technique (Fixer). Rules implementing Fixer would be executed first, serially, followed by all rules that just implement the main Rule interface, in parallel.

RuboCop's strategy of re-evaluating modified files and detecting loops between rules seems like a good approach for ensuring that behavior is deterministic and doesn't depend on rule execution order.

terraform-linters / tflint

feat: Add a --fix option to auto fix small issues like eslint #266