manuel-serrano / bigloo

a practical Scheme compiler
http://www-sop.inria.fr/indes/fp/Bigloo
Other
135 stars 19 forks source link

Trace information when hitting stack size limit #28

Closed svenha closed 3 years ago

svenha commented 5 years ago

If the stack size limit is exceeded in a bigloo program, only a SEGV is raised. Is there a way to receive more information, e.g. a call stack or similar?

Example (using a recursive function that is not implemented in a tail-recursive way, here append-map, and a stack size of 50000 KB):

(define l (append-map list (iota 10000000)))
manuel-serrano commented 5 years ago

Hi Sven,

I have tried several times but unfortunately I have never found a reliable way to get something better. If someone knows how to do that and can provide me with explanations or examples, I will be glad to improve this error detection and reporting.

svenha commented 5 years ago

Is there a way to determine the currect stack use?

manuel-serrano commented 5 years ago

Portable way, not as I know. The problem is that generally the program receives a SEGV and precisely because there is no stack left, there is not much it can do for collecting useful information.

PFOllagnon commented 5 years ago

I do not know at all the portability but a way to do it is to use sigaltstack / SA_ONSTACK to have the handler running on his own custom stack. I had something working in C but not correctly in Bigloo. In order to detect the stack overflow we need to check address in memoy + size stack at the very begining of the execution and compare it to the current address. I think it was not working probably because I was checking / setting all this at the begining of my scheme program instead of the very begining of the execution and Bigloo has already done some stuff and consume some stack. You can find an exemple of this here: https://opensource.apple.com/source/gm4/gm4-15/src/stackovf.c.auto.html

manuel-serrano commented 4 years ago

Thanks for the advice and for the link. I have added something like that and it helps. Unfortunately, it is not totally reliable as it is not always possible to recover nicely from a stack overflow. For instance, when it occurs right in the middle of a GC allocation or reclaim. In that case, the stack overflow is correctly reported but it is followed by another sigsegv that this time is not correctly handled. I guess this is already an improvement but for sure this is not perfect yet.

svenha commented 4 years ago

Very good and helpful because every stack overflow is reported in some way. Looking forward to the next tar file.

svenha commented 4 years ago

I tested this feature. I always received normal stack traces when the stack size limit was reached. Thanks a lot!

svenha commented 3 years ago

Just a small warning. I have around 600 major page faults (named MAJFLT in tools like htop) per minute in some bigloo-compiled applications. It started around one year ago. By chance - when investigating unrelated issues -, I disabled this stack overflow detection before configuring bigloo as follows:

sed -i.orig -e 's/getrlimit=.*/getrlimit=0/g' configure

Now, the number of major page faults goes down to 0-1 per minute.

manuel-serrano commented 3 years ago

Hi Sven,

This is strange because normally getrlimit is only used when the application receives a SIGSEGV (at least this is the intented behavior). Does your application triggers a lot of SIGSEGV? Is it difficult for you to debug your application in order to set a breakpoint in the function stackov_handler (runtime/Clib/csystem.c) to check when and why this function is called?

svenha commented 3 years ago

I don't see any signals. I use no signal handlers, so I should see SIGSEGV if they occur. I will investigate the problem.

manuel-serrano commented 3 years ago

Hum. Really weird...

You should try to recompile your whole application with -cg and then run the executable under gdb. It might also that you should have to recompile runtime/Clib/csystem with -g flag. For that you can proceed as follows:

cd runtime touch Clib/csystem.c make lib CEFLAGS=-g make lib_u CEFLAGS=-g sudo make install

On some platforms you can only set a breakpoint in a function of a shared library only after that library is loaded. This can be done simply by starting the application, hitting ^C after the beginning of the execution and then set the breakpoint with (gdb) b stackov_handler (gdb) c

This should do it.

svenha commented 3 years ago

Thanks for the recipe.

I must admit that the change of major page faults is unrelated to getrlimit (it was a red herring). There must be other (OS-related) causes. I can achieve zero major page faults also when configuring bigloo with getrlimit. Sorry and closing.

svenha commented 3 years ago

I found out what caused the major page faults: it was /usr/bin/time which I added to the call of the bigloo-compiled binary. This was for collecting some statistics, including major page faults. Looking back, that is somewhat funny.

manuel-serrano commented 3 years ago

On a vaguley related topics, these days I have played with Linux perf and FlameGraphs and I have found that very useful for profling Bigloo/Hop application. It just takes a couple of scripts. I'm will to share a tarball file for those that are interested...

svenha commented 3 years ago

Hi Manuel. I would be interested ...