Closed tkflexys closed 1 year ago
further investigation, if I make the following change the command doesn't hang but I am obviously asked for the broken dependency input now
dependency_inputs.hcl
dependency "owner_account" {
config_path = "${get_terragrunt_dir()}/../owner_account/"
skip_outputs = true
}
inputs = {
}
DEBU[0012] Running command: terraform destroy prefix=[/home/src/terragrunt/environments/dev/pentest/components/project]
var.sa-email
last lines of strace where the program hangs
tgkill(1383189, 1386309, SIGURG) = 0
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1383189, si_uid=1000} ---
rt_sigreturn({mask=[]}) = 0
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1383189, si_uid=1000} ---
rt_sigreturn({mask=[]}) = 20873377
futex(0x20383f0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x20382f8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x20382f8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x2035aa8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0xc000100948, FUTEX_WAKE_PRIVATE, 1) = 1
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1383189, si_uid=1000} ---
rt_sigreturn({mask=[]}) = 824643629056
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1383189, si_uid=1000} ---
rt_sigreturn({mask=[]}) = 824643629056
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1383189, si_uid=1000} ---
rt_sigreturn({mask=[]}) = 824643629056
futex(0xc000100548, FUTEX_WAKE_PRIVATE, 1) = 1
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1383189, si_uid=1000} ---
rt_sigreturn({mask=[]}) = 5
futex(0x20383f0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x20382f8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x20383c8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x20382f8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x2035aa8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x2035aa8, FUTEX_WAIT_PRIVATE, 0, NULL
strace with the -f flag shows a bit more
[pid 2399902] ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
[pid 2399902] ioctl(1, TIOCGWINSZ, {ws_row=48, ws_col=123, ws_xpixel=0, ws_ypixel=0}) = 0
[pid 2399902] newfstatat(AT_FDCWD, "/home/src/terragrunt/environments/dev/pentest/components/project/../owner_account/", {st_mode=S_IFDIR|0775, st_size=4096, ...}, 0) = 0
[pid 2399902] newfstatat(AT_FDCWD, "/home/src/terragrunt/environments/dev/pentest/components/owner_account/terragrunt.hcl", {st_mode=S_IFREG|0664, st_size=1888, ...}, 0) = 0
[pid 2399902] newfstatat(AT_FDCWD, "/home/src/terragrunt/environments/dev/pentest/components/owner_account/terragrunt.hcl", {st_mode=S_IFREG|0664, st_size=1888, ...}, 0) = 0
[pid 2399902] epoll_pwait(4, [], 128, 0, NULL, 0) = 0
[pid 2399902] epoll_pwait(4, <unfinished ...>
[pid 2399886] <... nanosleep resumed>NULL) = 0
[pid 2399886] futex(0x2816fd8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=979887146} <unfinished ...>
[pid 2399902] <... epoll_pwait resumed>[], 128, 986, NULL, 20485767901306) = 0
[pid 2399886] <... futex resumed>) = -1 ETIMEDOUT (Connection timed out)
[pid 2399902] epoll_pwait(4, [], 128, 0, NULL, 0) = 0
[pid 2399886] nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...>
[pid 2399902] epoll_pwait(4, <unfinished ...>
[pid 2399886] <... nanosleep resumed>NULL) = 0
[pid 2399886] futex(0x2816fd8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=2, tv_nsec=463901066}) = -1 ETIMEDOUT (Connection timed out)
[pid 2399886] nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...>
[pid 2399902] <... epoll_pwait resumed>[], 128, 2474, NULL, 20488242055626) = 0
[pid 2399902] epoll_pwait(4, [], 128, 0, NULL, 0) = 0
[pid 2399902] epoll_pwait(4, <unfinished ...>
[pid 2399886] <... nanosleep resumed>NULL) = 0
[pid 2399886] futex(0x2816fd8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=9, tv_nsec=988660601}) = -1 ETIMEDOUT (Connection timed out)
[pid 2399886] nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...>
[pid 2399902] <... epoll_pwait resumed>[], 128, 9996, NULL, 20498242055626) = 0
[pid 2399902] epoll_pwait(4, [], 128, 0, NULL, 0) = 0
[pid 2399902] epoll_pwait(4, <unfinished ...>
[pid 2399886] <... nanosleep resumed>NULL) = 0
[pid 2399886] futex(0x2816fd8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=9, tv_nsec=989009073}) = -1 ETIMEDOUT (Connection timed out)
[pid 2399886] nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...>
[pid 2399902] <... epoll_pwait resumed>[], 128, 9990, NULL, 20508242055626) = 0
[pid 2399902] epoll_pwait(4, [], 128, 0, NULL, 0) = 0
[pid 2399902] epoll_pwait(4, <unfinished ...>
[pid 2399886] <... nanosleep resumed>NULL) = 0
[pid 2399886] futex(0x2816fd8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=9, tv_nsec=988959570}) = -1 ETIMEDOUT (Connection timed out)
[pid 2399886] nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...>
[pid 2399902] <... epoll_pwait resumed>[], 128, 9989, NULL, 20518242055626) = 0
[pid 2399902] epoll_pwait(4, [], 128, 0, NULL, 0) = 0
[pid 2399902] epoll_pwait(4, <unfinished ...>
[pid 2399886] <... nanosleep resumed>NULL) = 0
[pid 2399886] futex(0x2816fd8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=9, tv_nsec=989032403}) = -1 ETIMEDOUT (Connection timed out)
[pid 2399886] nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...>
[pid 2399902] <... epoll_pwait resumed>[], 128, 9989, NULL, 20528242055626) = 0
[pid 2399902] epoll_pwait(4, [], 128, 0, NULL, 0) = 0
[pid 2399902] epoll_pwait(4, <unfinished ...>
[pid 2399886] <... nanosleep resumed>NULL) = 0
[pid 2399886] futex(0x2816fd8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=9, tv_nsec=989701179}) = -1 ETIMEDOUT (Connection timed out)
[pid 2399886] nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...>
[pid 2399902] <... epoll_pwait resumed>[], 128, 9992, NULL, 20538242055626) = 0
[pid 2399902] epoll_pwait(4, [], 128, 0, NULL, 0) = 0
[pid 2399902] epoll_pwait(4, <unfinished ...>
[pid 2399886] <... nanosleep resumed>NULL) = 0
[pid 2399886] futex(0x2816fd8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=9, tv_nsec=988813689}^C <unfinished ...>
[pid 2399885] <... futex resumed>) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
strace: Process 2399885 detached
strace: Process 2399886 detached
strace: Process 2399887 detached
strace: Process 2399888 detached
strace: Process 2399889 detached
strace: Process 2400305 detached
strace: Process 2399902 detached
After running and investigating the code myself I've found the offending line https://github.com/gruntwork-io/terragrunt/blob/68120e20c750c6761d6f961cf8eae3399ce1d361/config/dependency.go#L435
Nothing is executed after the Lock()
call on line 435
Did some more debugging and if I disable CheckDependentModules
, the program no longer hangs
https://github.com/gruntwork-io/terragrunt/blob/68120e20c750c6761d6f961cf8eae3399ce1d361/cli/cli_app.go#L373
after further investigation I found that the program freezes on this line https://github.com/gruntwork-io/terragrunt/blob/68120e20c750c6761d6f961cf8eae3399ce1d361/config/locals.go#L172
This happens when it checks for dependent modules and seems to hang when checking sub-dependencies i.e
some_module > project > owner_account
It stops when going over some_module
I've been trying to debug this all day and I found some strange behavior
At the root of our project we have a .git
folder, if I rename that folder to something else like .notgit
, the problem almost goes away and only happens once every 10 times I run terragrunt destroy
The repo itself isn't very large:
→ du -sh .git
4.5M .git
→ find .git/ -type f | wc -l
805
→ mv .git .notgit
→ cd /home/src/terragrunt/environments/dev/pentest/components/project
Run 1 Success
○ → terragrunt destroy
WARN[0004] No double-slash (//) found in source URL /module-noop.git. Relative paths in downloaded Terraform code may not work.
module.......: Refreshing state... [id=.....]
Run 2 Hang
DEBU[0002] [Partial] Included config /home/src/terragrunt/terragrunt-platform/components/owner_account/module.hcl has strategy shallow merge: merging config in (shallow). prefix=[/home/src/terragrunt/environments/dev/pentest/components/owner_account]
DEBU[0002] Found locals block: evaluating the expressions. prefix=[/home/src/terragrunt/environments/dev/pentest/components/owner_account]
DEBU[0002] Evaluated 2 locals (remaining 0): generate_environment_script, gen_env_noop prefix=[/home/src/terragrunt/environments/dev/pentest/components/owner_account]
DEBU[0002] [Partial] Included config /home/src/terragrunt/terragrunt-platform/generators/hooks.hcl has strategy shallow merge: merging config in (shallow). prefix=[/home/src/terragrunt/environments/dev/pentest/components/owner_account]
Run 3 Success
Plan: 0 to add, 0 to change, 37 to destroy.
Changes to Outputs:
.....
Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.
Enter a value:
Run 4 Hang Run 5 Success Run 6 Success Run 7 Success Run 8 Success Run 9 Success Run 10 Success Run 11 Success Run 12 Success Run 13 Hang Run 14 Success Run 15 Success
I've created a project which with I can reproduce the problem 100% of the time https://github.com/tkflexys/nothing
To get this to work you have to modify 2 files to configure a gcs project and bucket
Replace redacted
with a valid project and bucket
https://github.com/tkflexys/nothing/blob/master/environments/dev/pentest/components/owner_account/terragrunt.hcl#L18
https://github.com/tkflexys/nothing/blob/master/environments/dev/pentest/components/project/terragrunt.hcl#L18
cd environments/dev/pentest/components/owner_account
terragrunt init
terragrunt apply
cd ../project
terragrunt init
terragrunt apply
terragrunt destroy --terragrunt-log-level=trace
This will hang so you can interrupt it after a few minutes
DEBU[0000] Did not find any locals block: skipping evaluation. prefix=[/home/src/nothing/environments/dev/pentest/components/owner_account]
DEBU[0000] [Partial] Included config /home/src/nothing/platform/modules/owner_account_module/module.hcl has strategy shallow merge: merging config in (shallow). prefix=[/home/src/nothing/environments/dev/pentest/components/owner_account]
from the root of the repo
mv .git .notgit
cd environments/dev/pentest/components/project
This should work now, it hung for me on the first run
terragrunt destroy --terragrunt-log-level=trace
Hello, thanks for the investigation, I will take a look into fix
Here's a workaround:
terragrunt render-json
cat terragrunt_rendered.json | jq '.inputs' > inputs.json
cp inputs.json .terragrunt-cache/...
cd .terragrunt-cache/...
terraform destroy -var-file=inputs.json
Hope this helps
Here's a workaround:
* dump the inputs of the dependencies with `terragrunt render-json` * extract the input values: `cat terragrunt_rendered.json | jq '.inputs' > inputs.json` * copy this file `cp inputs.json .terragrunt-cache/...` * go where the terraform files are generated: `cd .terragrunt-cache/...` * run destroy `terraform destroy -var-file=inputs.json`
Hope this helps
Thanks a lot that was very helpful.
For anyone else that comes across this, we also had to collapse our list of inputs and instead opted in to pass them via TF_VAR variables
jq '.inputs' terragrunt_rendered.json > "inputs.json"
jq -r 'to_entries | .[] | select(.value | type != "object" and type != "array") | "TF_VAR_\(.key)=\(.value | @sh)"' "inputs.json" > env-vars
jq -r 'to_entries | .[] | select(.value | type == "object" or type == "array") | "TF_VAR_\(.key)=\(.value | @json | @sh)"' "inputs.json" >> env-vars
set -a
source env-vars
set +a
Hi, in my tests it started to work after fix from https://github.com/gruntwork-io/terragrunt/releases/tag/v0.46.1
Tested this locally and it works! Thanks for the fix
terragrunt version: 0.42.8 (also tested with latest and something as old as 0.39.1) terraform version: 1.3.5
Structure
dependency_inputs.hcl
merged_inputs.hcl
terragrunt.hcl
I ran it with debug and these are the few last lines of the output
For context, owner_account does not have any dependencies defined and I can run tg destroy just fine inside that component folder whereas the one inside the project folder which depends on owner_account fails
The second I comment out the line which calls
read_terragrunt_config
the execution seems to continue which seems like is the cause, I can run other commands likeinit
,plan
, andapply
just fine with this setup