Closed magnetik closed 2 years ago
Hi,
running --terragrunt-log-level debug
may show more information on performed steps
Hi,
Thanks for your answer !
So after starting terragrunt, I have
My terraform files are on a NFS to share with a virtual machine. The performance might not be good enough for terragrunt.
Not sure if there is anything that terragrunt can do.
Maybe will help set terragrunt-download-dir
on a local path to avoid additional requests over NFS creation of .terragrunt-cache
https://terragrunt.gruntwork.io/docs/reference/cli-options/#terragrunt-download-dir
Hi, I ran into this myself today and think I might be able to help narrow it down!
In terms of my setup:
First off, I bisected recent releases, and it looks like the issue was introduced in v0.37.3. It seems that the additional hashing work introduced by #2006 can end up adding a lot of overhead on large module trees.
I built a custom binary from that tag and included pprof
in it, then used its CPU profiling functionality to generate a flame graph (using 30s of data sampled from when my terragrunt plan
invocation was wedged for around 5 minutes at 100% CPU):
I'm not familiar enough with this part of Terragrunt to say what the right answer is here, but I'm hoping that this data can help point you in the right direction for a fix.
I'm happy to collect additional data for you, and also try building off a branch if you do have a proposed fix.
As an aside, would you be open to the addition of a flag to Terragrunt that starts a web server with pprof
enabled? I ended up compiling a custom version to do this, and I figure it could be a useful debugging flag to have built in. Happy to send a PR if you think it's a good idea!
Maybe will help set
terragrunt-download-dir
on a local path to avoid additional requests over NFS creation of.terragrunt-cache
https://terragrunt.gruntwork.io/docs/reference/cli-options/#terragrunt-download-dir
I'm already using the env variable $TERRAGRUNT_DOWNLOAD
with the value /tmp/terragrunt-cache
but it do not change much.
I'm confirming that 0.37.2 is not affected and exactly the same code with the same configuration (with $TERRAGRUNT_DOWNLOAD on /tmp) takes 48 seconds
Hi, good findings, I think it confirms that directory hash calculation consume CPU, I think can be improved by changing hash function
So I spent a bit more of last night reading through the code involved, and I just want to confirm that I've understood the aim of #2006 right.
Previously, Terragrunt would always treat local modules as something that needed to be downloaded (well, copied, but go-getter handles either) on every invocation. Unlike remote modules, there was no caching of them.
alreadyHaveLatestCode
that always returned false
for local modules, which was causing them to be copied into the Terragrunt cache folder on every invocation even if they hadn't changed. It introduced a caching mechanism based on calculating the hash of a module by recursively hashing the file contents of the module's directory tree.Assuming I've understood that right it seems unlikely that that approach will yield a performance benefit. By the time you've opened a file and calculated its hash, you might as well have copied it (unless the cache directory you're writing to is on extremely slow storage).
I was thinking a little more about this today, and another option might be to calculate a hash of the mtime
s of all the files in the module. It would mean that the entire hash could be calculated by recursively running stat
on the module's directory tree, rather than having to read each file (the next most expensive call in the flame graph) and put its entire contents through the (comparatively expensive) hash function.
We could also try swapping in something like MD5 instead of SHA256 (this usecase doesn't need the extra collision resistence - we're not trying to fend off malicious actors in our own source trees), but I suspect it would still be quite slow overall - definitely much slower than hashing the <file_path>:<mtime>
pairs.
If my understanding above is right, and you think that's a decent path to take, I'm happy to give it a go!
Hi, above findings are correct, improvement in the identification of changes that will use fewer resources will definitely help
Improvement released in https://github.com/gruntwork-io/terragrunt/releases/tag/v0.38.4
Thanks for all the review feedback @denis256. Awesome to see this released!
Hi,
Recently (after upgrating both terraform & terragrunt to latest),
terragrunt apply
hangs for 5 minutes with 100% CPU.When doing htop, I've noticed the subprocesses:
terraform init
that takes some secondsterraform outputs
that takes some seconds tooterragrunt apply
hanging for some long minutes. Each subprocess taking 100% CPU one after another for 4 minutes.I've tried to add
--terragrunt-debug
but it only generated theterragrunt-debug.tfvars.json
after the 5 minutes and it's some super helpful.What can I do?
Thanks!