Closed roberth closed 1 year ago
@roberth Here's a minimal repro. Evaluate builtins.match ".*" (toString (builtins.genList (x: x) 100000))
and HCI will segfault.
The regex library in the GNU C++ library eats a lot of stack for this regex, surprisingly. The stack runs out and that condition is not detected and reported properly. It also happens in Nix itself.
$ nix-instantiate --expr --eval 'builtins.match ".*" (toString (builtins.genList (x: x) 100000))'
error: stack overflow (possible infinite recursion)
but for a humongous stack, it works:
# 180 MB because number unit is 1KB
$ ulimit -Ss $[180 * 1024]; nix-instantiate --expr --eval 'builtins.match ".*" (toString (builtins.genList (x: x) 100000))'
[ ]
You may be able to work around this by setting a large stack size for system threads.
systemd.services.hercules-ci-agent = {
serviceConfig.LimitSTACK = 128 * 1024 * 1024;
};
This will allocate a lot of virtual memory for the stack, most of which is never mapped to real memory. Also the evaluation worker is a transient process, so the real memory is released after evaluating a job. Most threads in the system are Haskell's green threads, which are always tiny.
There's also a potential workaround that could be done in Nix code, which is to split large strings into smaller ones before applying the regex. E.g. splitting by lines first. I'd say that this "optimization" is completely unnecessary, if it wasn't for the fact that Nix always matches the whole input, as though it has $...^
. Clearly at least libstdc++ isn't clever enough to optimize away .*^
(considering your example with .*
and the fact that .
even matches newlines in Nix).
The regex stack behavior is unacceptable, and there's already a bug report in the upstream libstdc++.
An intermediate solution is to avoid libstdc++. The LLVM project's libc++ does not suffer from this problem, as evidenced by nix-instantiate
on darwin. Building hercules-ci-agent with LLVM for Linux runs into one or two cross-compilation issues. (pkgsLLVM
is implemented with the cross machinery.)
Another issue is that the stack overflow is reported as a crash instead of, well, a stack overflow. This requires either a small patch to Nix or a large change to the hercules-ci-agent process architecture.
You can expect a PR tomorrow :)
Description
worker exited with -11
(=segfault)
To Reproduce
agent rev 12d421bef8826efb2be10e0e39d7e8c60e497798
agent.follows.nixpkgs rev 16236dd7e33ba4579ccd3ca8349396b2f9c960fe
https://hercules-ci.com/github/purs-nix/purs-nix/jobs/80