Closed yachang closed 4 years ago
error 4 is "System error" in scamper source code.
I reboot mlab2.lga0t and see whether I can reproduce the segfault.
Reply from Matthew Luckie:
I'll need a core dump in order to debug this further. The tbit code is not in the execution path for how you use scamper. You said some servers, what fraction of servers is this occuring?
To get a core dump, you'll need to
CFLAGS='-g' ./configure --disable-privsep
and then recompile scamper. then, use "ulimit -c unlimited" to ensure the OS will create a core dump.
=====================
code change:
Push to prod:
https://github.com/m-lab/traceroute-caller/releases/tag/v0.3.2 https://github.com/m-lab/k8s-support/pull/322
lga0t stop having segfault with v0.3.1 since yesterday sandbox deployment.
I tried to capture core dump of segfault:
Program terminated with signal SIGSEGV, Segmentation fault.
After Matthew Luckie got the core dump, he sent as a new tarball w/ fix.
And it is in now: https://github.com/m-lab/traceroute-caller/pull/66
The new Docker Image running on sandbox for almost a day. There was NO Segfault, and the data quality (low % of empty hop trace with file size < 1K) improve dramatically:
We can close this issue when this traceroute caller version deployed to prod in new year.
a good number scamper segfaults on mlab2.lga0t. Is this expected? Log messages look like: scamper[5963]: segfault at 0 ip 0000558ffcd236f0 sp 00007ffed36cd0f8 error 4 in scamper[558ffcd14000+92000]