gruntwork-io / terragrunt

Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.
https://terragrunt.gruntwork.io/
MIT License
8.01k stars 971 forks source link

[tflint] Revisit lock solution for hooks. #2486

Open marinalimeira opened 1 year ago

marinalimeira commented 1 year ago

This is a follow-up from https://github.com/gruntwork-io/terragrunt/pull/2469#issuecomment-1448518004

  1. Thought experiment: is locking on code generation and hooks in folder xxx the right solution? Or should we be locking on all TG execution in folder xxx? For example, when you do run-all apply, we start executing, say, 3 modules concurrently: A, B, C. Now imagine that A and B both depends on C via a dependency blocks. Before this PR, we could run code generation and hooks in C twice, concurrently, once from each of A and B's dependency blocks. With this PR, we now have locking on code generation and hooks... But perhaps what we really want is a global lock, so that only one thing can execute C at a time: if we're processing a dependency block on C from A, then processing the dependency block from B has to wait. That will include the code generation and hooks, but also code copying, init, and anything else we add in the future.

  2. Thought experiment: should code generation be idempotent? If code generation creates a file foo.tf, and its contents are the same as the foo.tf on disk, perhaps we don't overwrite it?

denis256 commented 1 year ago

We can try to implement code generation in all modules and after executing run-all xxx but not sure what will happen if in the common module should be generated file with content based on data from childs - may be generated content from other module

tjstansell commented 1 year ago

FWIW, I opened #2480 which seems to be affected by all of this new locking. Perhaps whatever locking is done could be controlled by an option (or one that tells terragrunt that it doesn't need to do any sort of locking). The locking seems to be creating a serialization that greatly affects our overall runtime and it seems all of this is for the one use-case where your hook is running tflint (or something similar) that's not concurrency-safe. To me, this seems like an outlier, not the norm for hooks. Anyway, just trying to bring this potential issue to light for the folks involved in this change. I'll be trying to debug my particular issue more and provide tests, but these locking changes seem like an obvious reason for our terragrunt runs to take 2-4x longer.