Closed jyggen closed 1 year ago
There is an additional repro on #19060.
gist with debug logs for pants check
. A lot of paths are redacted, so let me know if this complicates debugging and I'll look into anonymizing them somehow instead.
I looked at the additional gist and the relevant Process
invocations are in that log. I don't see any processes invocations with a long list of arguments though because at least for the compile steps those processes are passing the source filenames via a file and just referencing the file in the command-line arguments.
I wonder what the source of the size of the huge processes are.
@stuhood: Any suggestions how to dig further here? Are process instances or results cached somewhere by the engine?
It may also be worth running a memory profiler to see where exactly these "process" allocations are occurring. https://github.com/KDE/heaptrack was useful to me at one point.
@stuhood: Any suggestions how to dig further here? Are process instances or results cached somewhere by the engine?
Yes: the Process
struct is cached as a Node
in the graph. If it is large, then our memory usage will be higher.
I expect that the next step would be taking advantage of the DeepSizeOf
trait that is peppered around the codebase (including on Process
and all of its transitively referenced types), to figure out what part of the Process
es from #19060 are large.
@jyggen's repro from the description is slightly different: both scandir
and paths
are pretty clearly suffering from path duplication (and taking up 65% of memory). We've solved that problem for DirectoryDigest
by switching to DigestTrie
, but we're not currently using a trie for those structures.
The connection between #19060 and this issue might be that the Args
list for some processes contain many paths (which are then additionally duplicates of other scandir
and paths
), but that would need confirmation using DeepSizeOf
.
We've solved that problem for
DirectoryDigest
by switching toDigestTrie
, but we're not currently using a trie for those structures.
I have a draft of doing this for Paths
over here: https://github.com/pantsbuild/pants/pull/19232 ... haven't benchmarked it yet.
Will do scandir
in a separate PR. (EDIT: https://github.com/pantsbuild/pants/pull/19246)
The >=2.18.0.dev2
releases contain a fix for the scandir
memory size: I'm still looking at how to solve the paths
memory size issue.
As mentioned on #19232, that approach ended up being a dead-end for paths
, because it didn't differentiate between matched directories and directories which were present because of a parent directory.
Cheers! I'm off work until early August, so won't be able to give it a try on our monorepo until then sadly.
Something else worth mentioning:
We have disabled pantsd in CI (self-hosted GitLab runner with 8GB of RAM) since forever due to it always being killed for OOM reasons anyway (and Pants exiting with an error when this happens while it's running). It hasn't been too painful other than sequential Pants invocations surely taking longer than they would've with pantsd running.
When upgrading from 2.16.x to 2.17.x however, we've started to see invocations of pants test
being killed for (presumably) memory reasons as well. This happens every CI run on my "bump to Pants 2.17.x" branch, and the only way to make it stay alive is to disable the Go backend (or revert to 2.16.x).
The exact, and not very helpful, output is:
/bin/bash: line 188: 47 Killed pants test --shard="$((${CI_NODE_INDEX}-1))/${CI_NODE_TOTAL}" ::
I do have a memory profile from heaptrack that I could send over once I'm back at work (it's probably more useful to you two than to me). I'll try to do some additional debugging then as well, but it looks like this issue will be blocking us from migrating past 2.16.x.
One connection that @jriddy raised around memory usage is that we are doing a lot more memoization than we probably need to: the last sticking point here is around the memory usage of paths
nodes... and it didn't hit me until last night that we almost certainly don't need to be memoizing those, which avoids any need to optimize their size in memory.
In general, we should (be able to) make a lot more memoization decisions than we currently are (where all @rule invokes are memoized, and only "rule helpers" are not). That way, if something is likely to only be used once, we avoid the cost of memoization, and only pay the cost of the dependency injection. I'll open a separate ticket about that.
As to resolving this issue: I'll get a PR out to remove the paths
node.
@jyggen I'm working on creating public reproduction repros for these kinds of issues, which will help us benchmark these things and write tests that keep memory and speed performance part of our release evaluations.
My current approach is to create a script that reproduces the dependency graph of a closed-source repo with none of the actual content of code and all target and path names obfuscated. I'm doing this mainly for python now, but I don't think there's any reason we couldn't do it for Go as well. Would you be interested in using this?
@jriddy Most definitely!
I'm still a bit worried about this one. When upgrading to Pants 2.17 we had to bump our CI instances to 16 GB memory in order for Pants to not get killed by the OS due to OOM. Disabling all Go-related backends brings this back down to a managable memory usage that could be handled by our previous 8 GB instances. We, as of writing, have 6828 lines of Go vs. 109335 lines of Python, so it's not a linear thing based on LoC.
I'm rather certain it's during analyzation/compilation of 3rd-party dependencies that the memory usage skyrockets. I haven't been able to confirm it nor find the reason why it would, but it's usually on of the last things to get printed in the debug logs before Pants/pantsd is killed.
I'll try to spend a day or two in the near future to look into this further, but worth flagging that the memory usage is still high - and from my perspective, without knowing the reason why, unexpectedly high.
+1 I'm finding that it's unrelated to the number of LoC but rather the size/quantity of the 3rd party golang deps. We are at a point that 35GB of memory is insufficient for our CI nodes.
We have a relatively small amount of Go code in our monorepository (49 files and 8041 lines according to
pants count-loc
, compared to 879 files and 101251 lines Python), and yet when enabling the Go backend we need to bumppantsd_max_memory_usage
from the default value up to at least8GiB
in order to make pantsd not restart on every Pants execution (though it still restarts frequently enough for our Python developers to simply disable the Go backend locally due to the performance degradation it causes).Here's the memory summary after running
pants --stats-memory-summary fix lint check test golang/::
on 2.16.0rc2: