Open jodavies opened 4 months ago
In the current state, a FORM crash looks like:
FORM 4.1 (May 31 2024, v4.1-20131025-641-g7062bd7) Run: Tue Jun 4 10:49:26 2024
Symbol x;
CFunction f;
Local test = 1/x;
Multiply replace_(x,0);
.end
Division by zero during normalization
Program terminating at crash.frm Line 4 -->
Terminate called from normal.c:4135 (Normalize). Backtrace:
#0: Terminate at startup.c:1799:20
#1: Normalize at normal.c:4139:2
#2: Generator at proces.c:3272:20
#3: Generator at proces.c:4216:8
#4: Generator at proces.c:4216:8
#5: Processor at proces.c:406:11
#6: DoExecute at execute.c:858:67
#7: PreProcessor at pre.c:1041:17
#8: main at startup.c:1692:2
#9: __libc_start_call_main at libc_start_call_main.h:58:16
#10: call_init at libc-start.c:128:20
#10: (inlined by) __libc_start_main_impl at libc-start.c:379:5
#11: _start at ??:0
0.00 sec out of 0.29 sec
or
TFORM 4.1 (May 31 2024, v4.1-20131025-641-g7062bd7) 4 workers Run: Tue Jun 4 10:51:03 2024
Symbol x;
CFunction f;
Local test = 1/x;
Multiply replace_(x,0);
.end
Division by zero during normalization
Program terminating in thread 1 at crash.frm Line 4 -->
Terminate called from normal.c:4135 (Normalize). Backtrace:
#0: Terminate at startup.c:1799:20
#1: Normalize at normal.c:4138:2
#2: Generator at proces.c:3272:20
#3: Generator at proces.c:4216:8
#4: Generator at proces.c:4216:8
#5: RunThread at threads.c:1393:10
#6: start_thread at pthread_create.c:442:8
#7: __clone3 at clone3.S:83
0.00 sec + 0.00 sec: 0.00 sec out of 0.23 sec
I cleaned this up a bit more, eg, it doesn't print any functions below main
(or start_thread
for tform
).
I don't see any performance regressions due to keeping debug symbols and frame pointers in the release build.
My opinion would be to always print the trace by default, but if anyone strongly opposes this I would add a setup parameter to control it.
It doesn't look nice when eu-addr2line
is not available...
TFORM 5.0.0-beta.1 (Jun 13 2024, v5.0.0-beta.1-69-g4c9b872) 4 workers Run: Thu Jun 13 19:26:48 2024
Symbol x;
CFunction f;
Local test = 1/x;
Multiply replace_(x,0);
.end
Division by zero during normalization
Program terminating in thread 1 at test.frm Line 4 -->
Terminate called from ../../sources/normal.c:4135 (Normalize)
Backtrace:
# 0: sh: 1: eu-addr2line: not found
# 1: sh: 1: eu-addr2line: not found
# 2: sh: 1: eu-addr2line: not found
# 3: sh: 1: eu-addr2line: not found
# 4: sh: 1: eu-addr2line: not found
# 5: sh: 1: eu-addr2line: not found
# 6: sh: 1: eu-addr2line: not found
# 7: sh: 1: eu-addr2line: not found
0.00 sec + 0.00 sec: 0.00 sec out of 0.08 sec
I wonder if we really need to replace Terminate
with TERMINATE
. In principle, Terminate
can be macronized so that it calls the original Terminate
(renamed to, say, TerminateImpl
) using the current source file name and the line number. (I'm not sure which is better.)
For TERMINATE I was vaguely following the "convention" that macros have uppercase names, but in the form sources it is true that plenty of macros do not have this. For me it would be fine to rename TERMINATE as you suggest.
As for eu-addr2line
, we could check for it (with which eu-addr2line
? Edit: command
can be used for this, and is POSIX?) and print a useful message that the user might want to install it.
I think command -v
is in POSIX.
Another question would be: do we want FORM to depend on glibc? backtrace
is neither POSIX nor Linux system call.
FORM already links libc
from somewhere, I did not need to provide the linker with -lc
. Whether this is actually a required link, I am not sure.
I'm not referring to libc
, but rather to glibc
, which is usually linked to programs on GNU/Linux. But, there are other implementations like musl libc on Alpine Linux.
I see, sorry. Then as I understand the "proper" way to do this is for the configure script to check for backtrace, and then we include this code only on platforms where it exists?
Yes, I think so. Or, optionally, we can consider a new configure script switch like --enable-backtrace
, which is automatically enabled if backtrace
is available.
About Terminate
vs. TERMINATE
, I would like to hear others' opinions.
These should implement the changes we discussed. The last can be reverted if TERMINATE is preferred after all.
I thought the last commit broke the static build but actually it went wrong before. But github gave it a green tick regardless...? Anyway it looks like https://sourceware.org/bugzilla/show_bug.cgi?id=15648
If I link with -lpthread
instead of -pthread
it works but I am not really sure what the difference is there.
The multiple __lll_lock_wait_private
issue will be resolved if we bump the Ubuntu version from 20.04 to 22.04, where the glibc version is 2.35.
This is true, but the build will still be broken for users who want to do their own static build (for whatever reason) on older systems. Edit: though of course in this (rare?) situation one can just --disable-backtrace
.
Here are a few ideas which may help with debugging in case of crashes, particularly for longer running code and crashes which other people can't reproduce. The first uses
eu-addr2line
to print a stack trace inTerminate
. The second replacesTerminate
everywhere with a macro which inserts file, line and function information into the error message, so one can see where exactlyTerminate
was called. The third retains debugging symbols in theform
build. This "probably" doesn't have a measurable performance impact (needs to be tested), but it means you get a useful trace fromform
crashes, and not justvorm
.Any thoughts?