gruntwork-io / terragrunt

Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.
https://terragrunt.gruntwork.io/
MIT License
8.06k stars 978 forks source link

Terragrunt v0.67.3 stdout log crashes #3387

Closed jmandel1027 closed 1 month ago

jmandel1027 commented 1 month ago

Describe the bug

Terragrunt stdout logging crashes after plans. It's very strange behaviour that's been pretty hard to debug.

14:29:05.505 STDOUT [environments/prod/XXX/us-west-2/prod-uswe2/network] terraform: Plan: 0 to add, 37 to change, 0 to destroy.
  | 14:29:05.505 STDOUT [environments/prod/XXX/us-west-2/prod-uswe2/network] terraform:
  | 14:29:05.828 ERROR  [environments/prod/XXX/us-west-2/prod-uswe2/network] terraform invocation failed in /builds/build-apse2-i-0f35bb491b8b2d52c-1/XXX/deploy-build-prod/terraform/.terragrunt-cache/UX-pMdlesG2hNFHfyEfJX3RoxI0/_sP8kGZeLeeI5QXy9hAwJxLQGAM/stacks/network error=[/builds/build-apse2-i-0f35bb491b8b2d52c-1/XXX/deploy-build-prod/terraform/.terragrunt-cache/UX-pMdlesG2hNFHfyEfJX3RoxI0/_sP8kGZeLeeI5QXy9hAwJxLQGAM/stacks/network] exit status 1
  | 14:29:05.828 DEBUG  util.ProcessExecutionError [/builds/build-apse2-i-0f35bb491b8b2d52c-1/XXX/deploy-build-prod/terraform/.terragrunt-cache/UX-pMdlesG2hNFHfyEfJX3RoxI0/_sP8kGZeLeeI5QXy9hAwJxLQGAM/stacks/network] exit status 1
  | /home/circleci/project/shell/run_shell_cmd.go:244 (0x11c516f)
  | /home/circleci/project/telemetry/metrics.go:42 (0x113eb73)
  | /home/circleci/project/telemetry/telemetry.go:80 (0x11c3bcd)
  | /home/circleci/project/telemetry/traces.go:38 (0x11414bd)
  | /home/circleci/project/telemetry/telemetry.go:79 (0x11c3ac7)
  | /home/circleci/project/shell/run_shell_cmd.go:107 (0x11c3748)
  | /home/circleci/project/shell/run_shell_cmd.go:77 (0x11c3585)
  | /home/circleci/project/cli/commands/terraform/action.go:444 (0x16f5c6c)
  | /home/circleci/project/cli/commands/terraform/action.go:323 (0x16f51f3)
  | /home/circleci/project/cli/commands/terraform/action.go:406 (0x16f5852)
  | /home/circleci/project/cli/commands/terraform/action.go:322 (0x16f4dae)
  | /home/circleci/project/cli/commands/terraform/action.go:235 (0x16f43a5)
  | /home/circleci/project/cli/commands/terraform/action.go:81 (0x16f36bc)
  | /home/circleci/project/cli/commands/terraform/command.go:46 (0x1db518d)
  | /home/circleci/project/cli/app.go:241 (0x1db6a87)
  | /home/circleci/go/pkg/mod/golang.org/x/sync@v0.8.0/errgroup/errgroup.go:78 (0x11a08f6)
  | /usr/local/go/src/runtime/asm_amd64.s:1695 (0x471a81)
  | goexit: // This takes arguments DI and AX
  |  
  | 14:29:05.828 ERROR  1 error occurred:
  | * [/builds/build-apse2-i-0f35bb491b8b2d52c-1/XXX/deploy-build-prod/terraform/.terragrunt-cache/UX-pMdlesG2hNFHfyEfJX3RoxI0/_sP8kGZeLeeI5QXy9hAwJxLQGAM/stacks/network] exit status 1

Expected behavior

Here is another passing Log from the same terragrunt module being deployed in a separate cluster:

14:57:14.409 STDOUT [environments/build/XXX/ap-southeast-2/build-apse2/network] terraform: Plan: 1 to add, 35 to change, 0 to destroy.
--
  | 14:57:14.409 STDOUT [environments/build/XXX/ap-southeast-2/build-apse2/network] terraform:

A clear and concise description of what you expected to happen.

Nice to haves

Versions

Additional context

Add any other context about the problem here.

yhakbar commented 1 month ago

Thanks for reporting this, @jmandel1027 .

We'll look into it! We're currently working on a suite of updates to logging behavior, so there's a good chance this can get fixed.

Would it be possible to share a set of configurations that lets us reliably reproduce the crash you're experiencing? We won't be able to validate that the issue has been resolved ourselves unless we get something to test it on.

yhakbar commented 1 month ago

I understand that you probably don't want to share proprietary content or content relevant to your AWS account, etc. but if you could share a sanitized version, that would help.

jmandel1027 commented 1 month ago

thanks @yhakbar!

It's gonna be tough for me to do that 😅 the terragrunt harness we use is very integrated into our CICD flows and not easily portable, we have dozens of shards of TF state supporting it so it's gonna be a challenge to extract that.

it's very strange though because other shards of the same stack type work normally except for this one.

yhakbar commented 1 month ago

We'll be relying on you to do our testing then 😁 , @jmandel1027 .

We'll make sure to update this issue when the next release of Terragrunt is out and ask for feedback as to whether it has addressed the issue you're experiencing.

jmandel1027 commented 1 month ago

here's some of the terragrunt flags we're currently using:

source .buildkite/deploy/scripts/setup-env.sh

terragrunt_args=(
  --terragrunt-working-dir "terraform"
  --terragrunt-config "${TARGET}/terragrunt.hcl"
  --terragrunt-non-interactive
  -out "${TARGET}/terraform.tfplan"
)

current_branch=$(git rev-parse --abbrev-ref HEAD)
echo ":tree: current branch: $current_branch"

echo "+++ :terraform: Planning"
# use local terragrunt if available
bin/terragrunt plan "${terragrunt_args[@]}"
remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "skip"
  }
  config = {
    bucket         = local.tf_state_bucket
    region         = local.tf_state_bucket_region
    key            = local.state_key
    encrypt        = true
    dynamodb_table = "terraformstatelock"
    session_name   = "terraform-configuration"
  }
}

// Prevents caching of the remote state which can hide changes and failures.

terraform {
  extra_arguments "init_args" {
    commands  = ["init"]
    arguments = ["-reconfigure", "-upgrade"]
  }
}
jmandel1027 commented 1 month ago

so Rolling back to 0.66.9 did actually reveal error logs that were obscured by 0.67.3 we managed to fix our actual issue, but the hidden logs definitely made it a challenge:


Plan: 0 to add, 37 to change, 0 to destroy.
--
  | ╷
  | │ Error: reading SSM Parameter (/XXX/network/r53/XXX/r53_zone_id): couldn't find resource
  | │
  | │   with module.vpc_endpoints["XXX"].data.aws_ssm_parameter.r53_zone_id["XXX"],
  | │   on ../../modules/vpce/data.tf line 1, in data "aws_ssm_parameter" "r53_zone_id":
  | │    1: data "aws_ssm_parameter" "r53_zone_id" {
  | │
  | ╵
  | ERRO[0030] terraform invocation failed in /builds/build-apse2-i-07c024c57a92aac86-4/XXX/deploy-build-prod/terraform/.terragrunt-cache/1NYqOo-VM6JmLXIn59pEOMHlKBY/nP8xaCHdfJfiyHrOwdctLS6HlCw/stacks/network  error=[/builds/build-apse2-i-07c024c57a92aac86-4/XXX/deploy-build-prod/terraform/.terragrunt-cache/1NYqOo-VM6JmLXIn59pEOMHlKBY/nP8xaCHdfJfiyHrOwdctLS6HlCw/stacks/network] exit status 1 prefix=[/builds/build-apse2-i-07c024c57a92aac86-4/XXX/deploy-build-prod/terraform/environments/prod/XXX/us-west-2/prod-uswe2/network]
  | DEBU[0030] util.ProcessExecutionError [/builds/build-apse2-i-07c024c57a92aac86-4/XXX/deploy-build-prod/terraform/.terragrunt-cache/1NYqOo-VM6JmLXIn59pEOMHlKBY/nP8xaCHdfJfiyHrOwdctLS6HlCw/stacks/network] exit status 1
  | /home/circleci/project/shell/run_shell_cmd.go:215 (0x119bc5f)
  | /home/circleci/project/telemetry/metrics.go:42 (0x1115853)
  | /home/circleci/project/telemetry/telemetry.go:77 (0x119a8ad)
  | /home/circleci/project/telemetry/traces.go:38 (0x111819d)
  | /home/circleci/project/telemetry/telemetry.go:76 (0x119a7a7)
  | /home/circleci/project/shell/run_shell_cmd.go:100 (0x119a428)
  | /home/circleci/project/shell/run_shell_cmd.go:74 (0x119a265)
  | /home/circleci/project/cli/commands/terraform/action.go:421 (0x16c5a69)
  | /home/circleci/project/cli/commands/terraform/action.go:307 (0x16c4ff3)
  | /home/circleci/project/cli/commands/terraform/action.go:385 (0x16c5652)
  | /home/circleci/project/cli/commands/terraform/action.go:306 (0x16c4bae)
  | /home/circleci/project/cli/commands/terraform/action.go:225 (0x16c41a5)
  | /home/circleci/project/cli/commands/terraform/action.go:81 (0x16c34dc)
  | /home/circleci/project/cli/commands/terraform/command.go:46 (0x1d862ed)
  | /home/circleci/project/cli/app.go:236 (0x1d87fa7)
  | /home/circleci/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 (0x11775d6)
  | /usr/local/go/src/runtime/asm_amd64.s:1695 (0x471701)
  | goexit: // This takes arguments DI and AX
  | ERRO[0030] 1 error occurred:
  | * [/builds/build-apse2-i-07
levkohimins commented 1 month ago

Resolved in v0.67.5 release.