Open seldridge opened 3 years ago
Interesting, this is a novel problem to me, a consequence of using a recursive descent parser, but also using threads. The problem is that individual threads don't have a deep stack so we run into issues like this. We can make this incrementally better, but I'm not sure how to fix this.
Turning off multithreading probably "fixes" this.
At least on linux, all threads, including the main one have the same stack size. We can vary the stack size (ulimit -s) in linux, and that is picked up by pthreads, pushing this problem out.
parsePrimExp and parseExpImpl are the mutually recursive functions dominating the stack. parsePrimExp is using O(2k) bytes per frame. I trimmed it slightly in https://github.com/llvm/circt/commit/08c19bb7fcff2891093a0f4c40df377dde39aee2, but it still seems out-sized for the variables it has. I haven't checked if some poor behaving thing is being inlined. Debug builds have ~2.5x larger stackframes for this function.
At least on linux, all threads, including the main one have the same stack size.
Are you sure about that? Main is typically different - it's stack grows down from the top of the address space, whereas pthreads have to be individually allocated, and typically get a fixed size allocation. This is one of the challenges of getting standard pthreads to scale to 100K threads: you get too many ~2M stacks.
parsePrimExp and parseExpImpl are the mutually recursive functions dominating the stack. parsePrimExp is using O(2k) bytes per frame. I trimmed it slightly in 08c19bb, but it still seems out-sized for the variables it has.
Awesome, I agree - that does sound wacky
On Mon, Sep 13, 2021 at 2:38 PM Chris Lattner @.***> wrote:
At least on linux, all threads, including the main one have the same stack size.
Are you sure about that? Main is typically different - it's stack grows down from the top of the address space, whereas pthreads have to be individually allocated, and typically get a fixed size allocation. This is one of the challenges of getting standard pthreads to scale to 100K threads: you get too many ~2M stacks.
It depends on a few factors. pthreads will default to 2M stacks if there is no ulimit set, but use ulimit if it is set. On any Redhat and Debian derived system I've seen, the default ulimit is set to 8MB.
From pthread_create man page:
Under the NPTL threading implementation, if the RLIMIT_STACK soft resource limit at the time the program started has any value other than "unlimited", then it determines the default stack size of new threads.
The same RLIMIT_STACK will also control the size of the main stack at exec time, and I think prevent growth, so in common configuration, the main thread probably has less effective stack space due to environment.
Circuits with long lines and/or deeply nested expressions seem to crash the lexer.
Here's a failing circuit (this is a 1024-deep concatenation):
This is erroring out with an
EXC_BAD_ACCESS
occurring in the lexer: