Open divinity76 opened 3 years ago
Can you try to compile with debug symbols enabled? Doing so gives a better trace in the error message).
Given from the function that's active in the trace this is likely somewhere in UsersTable_getRef
. Are you sure the compiled version matches the runtime glibc (cf. the linker warning)?
Can you try to compile with debug symbols enabled? Doing so gives a better trace in the error message).
if by debug symbols enabled
you mean ./configure --enable-debug
, seems that didn't change anything on the segfault output,
FATAL PROGRAM ERROR DETECTED
============================
Please check at https://htop.dev/issues whether this issue has already been reported.
If no similar issue has been reported before, please create a new issue with the following information:
- Your htop version (htop --version)
- Your OS and kernel version (uname -a)
- Your distribution and release (lsb_release -a)
- Likely steps to reproduce (How did it happened?)
- Backtrace of the issue (see below)
Error information:
------------------
A signal 11 (Segmentation fault) was received.
Backtrace information:
----------------------
The following function calls were active when the issue was detected:
---
[0x406d3b]
[0x453030]
/lib/x86_64-linux-gnu/libc.so.6(getauxval+0x1b)[0x7ffff76bf14b]
/lib/x86_64-linux-gnu/libnss_systemd.so.2(+0x1285e)[0x7ffff740585e]
/lib/x86_64-linux-gnu/libnss_systemd.so.2(+0x12bca)[0x7ffff7405bca]
/lib/x86_64-linux-gnu/libnss_systemd.so.2(_nss_systemd_getpwuid_r+0x12e)[0x7ffff741d8ae]
[0x4b0c03]
[0x4b059b]
[0x415a4f]
[0x41b7aa]
[0x41a996]
[0x41cf5b]
[0x411cbb]
[0x401fef]
[0x443c20]
[0x40275e]
---
To make the above information more practical to work with,
you should provide a disassembly of your binary.
This can usually be done by running the following command:
objdump -d -S -w `which htop` > ~/htop.objdump
Please include the generated file in your report.
Running this program with debug symbols or inside a debugger may provide further insights.
Thank you for helping to improve htop!
htop 3.0.6-dev aborting.
Segmentation fault (core dumped)
Are you sure the compiled version matches the runtime glibc
nope, i don't know, not sure how to check, but in a static build, should it matter what the runtime glibc is?
Can you try to compile with debug symbols enabled? Doing so gives a better trace in the error message).
if by
debug symbols enabled
you mean./configure --enable-debug
, seems that didn't change anything on the segfault output,No. Was referring to providing
-Og -g
asCFLAGS
/LDFLAGS
tomake
, so that the resulting binary is debuggable.--enable-debug
activates some internal housekeeping checks in htop to check the various memory operations work as intended.Are you sure the compiled version matches the runtime glibc
nope, i don't know, not sure how to check, but in a static build, should it matter what the runtime glibc is?
Some of the called functions in glibc are dependent on structures that are created by different processes and communicated to the local process. This is most noticeable with things like PAM, but getpwuid
which needs to talk to libnss
is another example of this dependency. When running a dynamically linked program, the proper version of the functions, which can handle the talking to the other processes is loaded automatically. But for statically linked processes the version of the function used in the statically linked binary and other processes in the system may differ; thus causing all sorts of spurious issues.
btw have you (or anyone) tried to reproduce it?
seems compiling it with
./configure --enable-debug --enable-static --disable-unicode --disable-hwloc --disable-setuid --disable-sensors --disable-capabilities --disable-openvz --disable-vserver --disable-ancient-vserver --disable-delayacct --disable-linux-affinity;
make clean
CFLAGS="-Og -g" LDFLAGS="-Og -g" make -j $(nproc);
didn't make a difference, still got the same output,
FATAL PROGRAM ERROR DETECTED
============================
Please check at https://htop.dev/issues whether this issue has already been reported.
If no similar issue has been reported before, please create a new issue with the following information:
- Your htop version (htop --version)
- Your OS and kernel version (uname -a)
- Your distribution and release (lsb_release -a)
- Likely steps to reproduce (How did it happened?)
- Backtrace of the issue (see below)
Error information:
------------------
A signal 11 (Segmentation fault) was received.
Backtrace information:
----------------------
The following function calls were active when the issue was detected:
---
[0x406d3b]
[0x453030]
/lib/x86_64-linux-gnu/libc.so.6(getauxval+0x1b)[0x7ffff76bf14b]
/lib/x86_64-linux-gnu/libnss_systemd.so.2(+0x1285e)[0x7ffff740585e]
/lib/x86_64-linux-gnu/libnss_systemd.so.2(+0x12bca)[0x7ffff7405bca]
/lib/x86_64-linux-gnu/libnss_systemd.so.2(_nss_systemd_getpwuid_r+0x12e)[0x7ffff741d8ae]
[0x4b0c03]
[0x4b059b]
[0x415a4f]
[0x41b7aa]
[0x41a996]
[0x41cf5b]
[0x411cbb]
[0x401fef]
[0x443c20]
[0x40275e]
---
To make the above information more practical to work with,
you should provide a disassembly of your binary.
This can usually be done by running the following command:
objdump -d -S -w `which htop` > ~/htop.objdump
Please include the generated file in your report.
Running this program with debug symbols or inside a debugger may provide further insights.
Thank you for helping to improve htop!
htop 3.0.6-dev aborting.
Segmentation fault (core dumped)
./configure --enable-static --disable-unicode --disable-hwloc --disable-setuid --disable-sensors --disable-capabilities --disable-openvz --disable-vserver --disable-ancient-vserver --disable-delayacct --disable-linux-affinity
make -j $(nproc)
./htop
works here on Ubuntu 20.04.1 LTS, also AMD64
aka cannot reproduce
@fasterit thanks, can you upload the htop you produced? it'd be interesting to see if that same binary is crashing on my system or not
Can reproduce on Debian sid with libnss-systemd
installed and a service with DynamicUser=yes
running:
Starting program: /home/christian/Coding/workspaces/htop/htop > run.txt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7963350 in _nss_systemd_is_blocked () at ../src/nss-systemd/nss-systemd.c:639
639 ../src/nss-systemd/nss-systemd.c: No such file or directory.
(gdb) bt full
#0 0x00007ffff7963350 in _nss_systemd_is_blocked () at ../src/nss-systemd/nss-systemd.c:639
No locals.
#1 0x00007ffff7964507 in userdb_getpwuid (errnop=<synthetic pointer>, buflen=1024, buffer=0x5508b0 "", pwd=0x54c440 <resbuf>, uid=63378) at ../src/nss-systemd/userdb-glue.c:111
hr = 0x0
r = <optimized out>
hr = <optimized out>
r = <optimized out>
__PRETTY_FUNCTION__ = {<optimized out> <repeats 16 times>}
#2 _nss_systemd_getpwuid_r (uid=63378, pwd=0x54c440 <resbuf>, buffer=0x5508b0 "", buflen=1024, errnop=0x54e300) at ../src/nss-systemd/nss-systemd.c:169
status = <optimized out>
e = <optimized out>
_saved_errno_ = 0
_saved_sigset = {__val = {0, 4919111, 8320808640339734111, 8387223540451861876, 32193032215689072, 4919047, 140737347416736, 94489280512, 6729072, 5292379, 5555264, 15912848161058534912, 5563136, 140737488343600, 0, 63232}}
__PRETTY_FUNCTION__ = "_nss_systemd_getpwuid_r"
_found = <optimized out>
__assert_in_set = {<optimized out>, <optimized out>}
#3 0x00000000004aa713 in getpwuid_r ()
No symbol table info available.
#4 0x00000000004aa0c3 in getpwuid ()
No symbol table info available.
#5 0x0000000000415487 in UsersTable_getRef (this=0x550cc0, uid=63378) at UsersTable.c:35
userData = <optimized out>
name = <optimized out>
#6 0x000000000041ae6c in LinuxProcessList_recurseProcTree (this=0x550e50, parentFd=<optimized out>, dirname=<optimized out>, parent=0x0, period=279062.25, now=1612029284743) at linux/LinuxProcessList.c:1411
pid = <optimized out>
proc = <optimized out>
procFd = 4
command = "sleep\000lctl\000r\000events_unbound\000icient", '\000' <repeats 94 times>
lasttimes = 0
tty_nr = <optimized out>
lp = <optimized out>
percent_cpu = <optimized out>
name = <optimized out>
preExisting = false
pl = 0x550e50
entry = <optimized out>
settings = 0x555ea0
dirFd = 3
dir = <optimized out>
cpus = 4
hideKernelThreads = true
hideUserlandThreads = true
errorReadingProcess = <optimized out>
__PRETTY_FUNCTION__ = "LinuxProcessList_recurseProcTree"
#7 0x000000000041c563 in ProcessList_goThroughEntries (super=super@entry=0x550e50, pauseProcessUpdate=pauseProcessUpdate@entry=false) at linux/LinuxProcessList.c:1988
this = 0x550e50
settings = <optimized out>
period = 279062.25
tv = {tv_sec = 1612029284, tv_usec = 743522}
now = <optimized out>
rootFd = -100
#8 0x00000000004118af in ProcessList_scan (this=this@entry=0x550e50, pauseProcessUpdate=pauseProcessUpdate@entry=false) at ProcessList.c:572
now = {tv_sec = 5599984, tv_nsec = 5901632}
firstScanDone = true
#9 0x0000000000401edc in main (argc=<optimized out>, argv=<optimized out>) at htop.c:468
lc_ctype = <optimized out>
flags = {pidMatchList = 0x0, commFilter = 0x0, userId = <optimized out>, sortKey = 0, delay = -1, useColors = true, enableMouse = true, treeView = false, allowUnicode = <optimized out>, highlightChanges = <optimized out>, highlightDelaySecs = <optimized out>}
ut = 0x550cc0
pl = 0x550e50
settings = 0x555ea0
header = 0x5561c0
panel = 0x7fffffffda50
state = {settings = 0x555ea0, ut = 0x550cc0, pl = 0x550e50, panel = 0x5a0d40, header = 0x5561c0, pauseProcessUpdate = false, hideProcessSelection = false}
scr = 0x5572f0
(gdb) quit
did it segfault in _nss_systemd_is_blocked() ? the only reference i could find to that function is https://github.com/systemd/systemd/blob/main/src/nss-systemd/nss-systemd.c
with _nss_systemd_is_blocked() defined as
_public_ bool _nss_systemd_is_blocked(void) {
return _blocked > 0;
}
and _blocked defined as
static thread_local unsigned _blocked = 0;
uhm, may suggest that the pointer to the static thread_local
somehow got messed up? (the static binary believes it's elsewhere, trying to read unallocated memory?)
did it segfault in _nss_systemd_is_blocked() ? the only reference i could find to that function is https://github.com/systemd/systemd/blob/main/src/nss-systemd/nss-systemd.c
Hoping to provide some insight since I recently had to debug this issue for our containers running a statically linked binary that used nss-systemd. As I understand it, statically linked binaries do all of their thread local storage (TLS) allocations at the start of the program based on what was linked in. NSS modules are dlopen-ed so if they use TLS they do not get allocated in the statically linked binary scenario (hence the segfault when trying to access it).
One way to fix this is to rebuild systemd. You can add -mtls-dialect=gnu2
to gcc (on x86-64) when building systemd so that it uses the TLS descriptor method for TLS instead of the traditional TLS method. This changes the way the allocation works. Alternatively you can use the -ftls-model=initial-exec
gcc flag which will use the traditional TLS method with the initial-exec TLS model. This ends up using the (limited) surplus space set aside for things like dlopen-ed shared objects using static TLS. You can also patch systemd by adding __attribute__ ((tls_model("initial-exec")))
to __blocked
to achieve the same effect as -ftls-model=initial-exec
. I went with TLS descriptors since https://www.google.com/search?q=%22dlopen:+cannot+load+any+more+object+with+static+TLS%22 suggests running out of surplus space is a real problem.
Another way to get around this is to remove "systemd" from /etc/nsswitch.conf
on the system so that the statically linked binary doesn't try to use nss-systemd. This might not be an option if you actually need/want to use nss-systemd. It should also be possible to explicitly link against nss-systemd in the statically linked application so the thread-local from nss-systemd will have an allocation (but I haven't tried this).
Of course you can also try to fix this upstream. The commit introducing the thread local to nss-systemd is in https://github.com/systemd/systemd/commit/037b0a47b0d7df09d720dda6703135117e7e0472 which was relatively recent (systemd 246, so 2 versions ago). There could be a way to achieve the goals mentioned in that change without thread locals.
edit: -mtls-dialect=gnu2
doesn't seem to work for x32 until gcc 10 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93319) and possibly also binutils 2.35 (https://sourceware.org/bugzilla/show_bug.cgi?id=25416). If that applies to anyone then adding __attribute__ ((tls_model("initial-exec")))
might be more universal.
Also facing this trying to run htop static in a flatcar linux host
Also facing this trying to run htop static in a flatcar linux host
Which version of htop did you run? AFAIR there was a commit recently that changed some aspects of building htop
statically and might have had an influence on this issue.
observed on Ubuntu 20.04 AMD64, revision:
steps to reproduce:
result: