terraform-linters / tflint

A Pluggable Terraform Linter
Mozilla Public License 2.0
4.93k stars 354 forks source link

Intermittent "closed connection" errors are logged from plugins #1274

Closed carldjohnston closed 2 years ago

carldjohnston commented 2 years ago

Firstly, thank you for your amazing software, it really helps me and my team write consistent terraform code ❤️

I'm running tflint against ~20 modules in a monorepo and seeing networking errors and warnings from tflint quite regularly.

We run tflint from a for loop within a Makefile, but this bash loop also shows the errors regularly:

for d in modules/*/ ; do
  echo $d
  tflint $d
done

The errors take the form of:

2021-12-09T10:55:10.215+1100 [WARN]  plugin: error closing client during Kill: err="unexpected EOF"
2021-12-09T10:55:10.215+1100 [WARN]  plugin: plugin failed to exit gracefully

or

2021/12/09 10:55:53 [ERR] yamux: Failed to write header: write unix @->/tmp/plugin2881708702: use of closed network connection

We run tflint from within an Ubuntu 20.04 docker container, and the same issue is shown on macOS, Windows, or Linux hosts.

I have the azurerm, aws, and google plugins downloaded into ~/.tflint.d/plugins during the container build step.

TFLint Configuration

plugin "azurerm" {
  enabled = true
}
plugin "aws" {
  enabled = true
}
plugin "google" {
  enabled = true
}
config {
  module              = false
  force               = false
  disabled_by_default = false
}
rule "terraform_deprecated_interpolation" {
  enabled = true
}
rule "terraform_deprecated_index" {
  enabled = true
}
rule "terraform_unused_declarations" {
  enabled = true
}
rule "terraform_comment_syntax" {
  enabled = true
}
rule "terraform_documented_outputs" {
  enabled = true
}
rule "terraform_documented_variables" {
  enabled = true
}
rule "terraform_typed_variables" {
  enabled = true
}
rule "terraform_module_pinned_source" {
  enabled = true
}
rule "terraform_naming_convention" {
  enabled = true
}
rule "terraform_required_version" {
  enabled = true
}
rule "terraform_required_providers" {
  enabled = true
}
rule "terraform_unused_required_providers" {
  enabled = true
}
rule "terraform_standard_module_structure" {
  enabled = true
}
rule "terraform_workspace_remote" {
  enabled = true
}

Version

$ tflint -v
TFLint version 0.33.2
+ ruleset.aws (0.10.0)
+ ruleset.google (0.15.0)
+ ruleset.azurerm (0.14.0)

$ terraform -v
Terraform v1.0.11
on linux_amd64
bendrucker commented 2 years ago

Is there a concrete problem here or are you concerned that you're seeing these logs?

bendrucker commented 2 years ago

In other words, is tflint exiting with a non-zero code in these situations? These sorts of logs are in a plugin based system, you will see them with Terraform too. The logs could be better to help highlight which plugin is involved.

What you've provided isn't really helpful as a reproduction given the number of plugins, rules, etc. involved. If you can find a reproducible means to trigger this, e.g. running tflint in a loop until an error is observed, that would help move towards a resolution.

carldjohnston commented 2 years ago

Is there a concrete problem here or are you concerned that you're seeing these logs?

The issue is just the log output, there's a zero exit code from tflint.

In other words, is tflint exiting with a non-zero code in these situations? These sorts of logs are in a plugin based system, you will see them with Terraform too. The logs could be better to help highlight which plugin is involved.

What you've provided isn't really helpful as a reproduction given the number of plugins, rules, etc. involved. If you can find a reproducible means to trigger this, e.g. running tflint in a loop until an error is observed, that would help move towards a resolution.

Thanks for the feedback, I'll try to create a way to reproduce this simply.

bendrucker commented 2 years ago

The issue is just the log output, there's a zero exit code from tflint.

Thanks, so I can imagine there could be a few things going on here:

As mentioned, you will tend to see these logs in Terraform with a diversity of (provider) plugins used. If you have a resource that's failing and are getting logs about a plugin exiting ungracefully, that could be meaningful. But if a plugin completes all of its work successfully (via RPC calls) but then exits ungracefully, that will be hidden unless you set TF_LOG_LEVEL.

Plugins are individual processes and there's a lot of inputs that could affect this issue. With large configs with a lot of providers, you might face memory pressure with Terraform runs leading to unreliable provider plugin performance. That is especially true in container-based TF environments like Enterprise/Cloud where the limits will be relatively low.

TFLint running in CI seems unlikely to face any sort of resource limitations, e.g., GitHub Actions hosted runners have 7GB memory and 2 CPU cores. TFLint also has far fewer plugins. But it's a possibility. If you're setting a memory limit on the container, that's worth a look.

bendrucker commented 2 years ago

If this is reproducible via Docker on all operating systems it should be possible to reproduce this via docker run. If you can make a public repo that we can clone and docker run $(docker build -q .) in that would be a big help.

sagar89jadhav commented 2 years ago

Today, I had come across the same error while executing tflint command from a shell script but after doing some research I have managed to suppress those warning/error messages by redirecting script's output to null device /dev/null but still not able understand the cause of these warnings.

tflint script ->

#!/usr/bin/env bash
echo "Scanning all files(*.tf) with tflint"
find * -name '*.tf' | grep -E -v ".terraform|.terragrunt-cache" | while read -r line; do
    tflint "$line" -f compact
done

Github workflow steps ->

- name: Lint Terraform Code
   shell: bash
   run: scripts/tflint.sh 2> /dev/null
   continue-on-error: false

Any idea why such warnings are showing when tflint is executed via some script?

wata727 commented 2 years ago

The TFLint plugin system has been completely rewritten in v0.35.0. Perhaps this issue has already been resolved or is occurring as another issue.

For this reason, this issue will be closed. If you have similar issues, open a new issue. Thank you.