Closed svenha closed 3 years ago
Hi Sven,
I have tried several times but unfortunately I have never found a reliable way to get something better. If someone knows how to do that and can provide me with explanations or examples, I will be glad to improve this error detection and reporting.
Is there a way to determine the currect stack use?
Portable way, not as I know. The problem is that generally the program receives a SEGV and precisely because there is no stack left, there is not much it can do for collecting useful information.
I do not know at all the portability but a way to do it is to use sigaltstack / SA_ONSTACK to have the handler running on his own custom stack. I had something working in C but not correctly in Bigloo. In order to detect the stack overflow we need to check address in memoy + size stack at the very begining of the execution and compare it to the current address. I think it was not working probably because I was checking / setting all this at the begining of my scheme program instead of the very begining of the execution and Bigloo has already done some stuff and consume some stack. You can find an exemple of this here: https://opensource.apple.com/source/gm4/gm4-15/src/stackovf.c.auto.html
Thanks for the advice and for the link. I have added something like that and it helps. Unfortunately, it is not totally reliable as it is not always possible to recover nicely from a stack overflow. For instance, when it occurs right in the middle of a GC allocation or reclaim. In that case, the stack overflow is correctly reported but it is followed by another sigsegv that this time is not correctly handled. I guess this is already an improvement but for sure this is not perfect yet.
Very good and helpful because every stack overflow is reported in some way. Looking forward to the next tar file.
I tested this feature. I always received normal stack traces when the stack size limit was reached. Thanks a lot!
Just a small warning. I have around 600 major page faults (named MAJFLT in tools like htop) per minute in some bigloo-compiled applications. It started around one year ago. By chance - when investigating unrelated issues -, I disabled this stack overflow detection before configuring bigloo as follows:
sed -i.orig -e 's/getrlimit=.*/getrlimit=0/g' configure
Now, the number of major page faults goes down to 0-1 per minute.
Hi Sven,
This is strange because normally getrlimit is only used when the application receives a SIGSEGV (at least this is the intented behavior). Does your application triggers a lot of SIGSEGV? Is it difficult for you to debug your application in order to set a breakpoint in the function stackov_handler (runtime/Clib/csystem.c) to check when and why this function is called?
I don't see any signals. I use no signal handlers, so I should see SIGSEGV if they occur. I will investigate the problem.
Hum. Really weird...
You should try to recompile your whole application with -cg and then run the executable under gdb. It might also that you should have to recompile runtime/Clib/csystem with -g flag. For that you can proceed as follows:
cd runtime touch Clib/csystem.c make lib CEFLAGS=-g make lib_u CEFLAGS=-g sudo make install
On some platforms you can only set a breakpoint in a function of a shared library only after that library is loaded. This can be done simply by starting the application, hitting ^C after the beginning of the execution and then set the breakpoint with (gdb) b stackov_handler (gdb) c
This should do it.
Thanks for the recipe.
I must admit that the change of major page faults is unrelated to getrlimit (it was a red herring). There must be other (OS-related) causes. I can achieve zero major page faults also when configuring bigloo with getrlimit. Sorry and closing.
I found out what caused the major page faults: it was /usr/bin/time
which I added to the call of the bigloo-compiled binary. This was for collecting some statistics, including major page faults. Looking back, that is somewhat funny.
On a vaguley related topics, these days I have played with Linux perf and FlameGraphs and I have found that very useful for profling Bigloo/Hop application. It just takes a couple of scripts. I'm will to share a tarball file for those that are interested...
Hi Manuel. I would be interested ...
If the stack size limit is exceeded in a bigloo program, only a SEGV is raised. Is there a way to receive more information, e.g. a call stack or similar?
Example (using a recursive function that is not implemented in a tail-recursive way, here
append-map
, and a stack size of 50000 KB):