hishamhm / htop

htop is an interactive text-mode process viewer for Unix systems. It aims to be a better 'top'.
GNU General Public License v2.0
5.84k stars 581 forks source link

htop abort on non-global zone restart #972

Closed Mno-hime closed 4 years ago

Mno-hime commented 4 years ago

I ran htop tag 220-sunos_11-p3 by @ninefathom in global zone on OpenIndiana and restarted non-global zone and htop aborted with:

 newman  ~  htop
htop 2.2.0 aborting. Please report bug at http://hisham.hm/htop
Please include in your report the following backtrace: 
/usr/bin/htop'CRT_handleSIGSEGV+0x38 [0x420298]
/lib/amd64/libc.so.1'__sighndlr+0x6 [0xfffffd7fef2a3566]
/lib/amd64/libc.so.1'call_user_handler+0x1db [0xfffffd7fef29644b]
/lib/amd64/libc.so.1'strlen+0x14 [0xfffffd7fef209fd4]
/usr/bin/htop'xStrdup+0x9 [0x41e5c9]
/usr/bin/htop'SolarisProcessList_walkproc+0x3f4 [0x41f3d4]
/lib/amd64/libproc.so.1'proc_walk+0x2d6 [0xfffffd7fecbb4f46]
/usr/bin/htop'ProcessList_goThroughEntries+0x7d7 [0x420157]
/usr/bin/htop'ProcessList_scan+0x4d [0x41774d]
/usr/bin/htop'ScreenManager_run+0xf1 [0x418301]
/usr/bin/htop'main+0x402 [0x412762]
/usr/bin/htop'_start_crt+0x83 [0x40e3b3]
/usr/bin/htop'_start+0x18 [0x40e318]

Abort (core dumped)

Running mdb on htop core says:

 newman  ~  mdb core 
Loading modules: [ libc.so.1 libproc.so.1 ld.so.1 ]
> ::status
debugging core file of htop (64-bit) from lenovo
file: /usr/bin/htop
initial argv: htop
threading model: native threads
status: process terminated by SIGABRT (Abort), pid=8257 uid=101 code=-1
> $C
fffffd7fffdfe910 libc.so.1`_lwp_kill+0xa()
fffffd7fffdfe940 libc.so.1`raise+0x1e(6)
fffffd7fffdfe990 libc.so.1`abort+0x58()
fffffd7fffdfe9b0 0x420346()
fffffd7fffdfe9c0 libc.so.1`__sighndlr+6()
fffffd7fffdfea60 libc.so.1`call_user_handler+0x1db(b, 0, fffffd7fffdfead0)
fffffd7fffdfeab0 libc.so.1`sigacthandler+0xee(b, 0, fffffd7fffdfead0)
fffffd7fffdfeee0 libc.so.1`strlen+0x14()
fffffd7fffdfeef0 xStrdup+9()
fffffd7fffdfefa0 SolarisProcessList_walkproc+0x3f4()
fffffd7fffdff600 libproc.so.1`proc_walk+0x2d6(41efe0, 4a6a60, 1)
fffffd7fffdff6a0 ProcessList_goThroughEntries+0x7d7()
fffffd7fffdff6d0 ProcessList_scan+0x4d()
fffffd7fffdff770 ScreenManager_run+0xf1()
fffffd7fffdff820 main+0x402()
fffffd7fffdff850 _start_crt+0x83()
fffffd7fffdff860 _start+0x18()

objdump is attached: htop.objdump.txt.

ghost commented 4 years ago

Ack'ed. Should be easy enough to track down, @Mno-hime - xStrdup is only called in the Solaris port about five times or so, and I think there's only one call directly in walkproc(). I'll have a look soon.

On Tue, Dec 24, 2019 at 5:12 PM Michal Nowak notifications@github.com wrote:

I ran htop tag 220-sunos_11-p3 https://github.com/ninefathom/htop/releases/tag/220-sunos_11-p3 by @ninefathom https://github.com/ninefathom in global zone on OpenIndiana and restarted non-global zone and htop aborted with:

newman  ~  htop

htop 2.2.0 aborting. Please report bug at http://hisham.hm/htop

Please include in your report the following backtrace:

/usr/bin/htop'CRT_handleSIGSEGV+0x38 [0x420298]

/lib/amd64/libc.so.1'__sighndlr+0x6 [0xfffffd7fef2a3566]

/lib/amd64/libc.so.1'call_user_handler+0x1db [0xfffffd7fef29644b]

/lib/amd64/libc.so.1'strlen+0x14 [0xfffffd7fef209fd4]

/usr/bin/htop'xStrdup+0x9 [0x41e5c9]

/usr/bin/htop'SolarisProcessList_walkproc+0x3f4 [0x41f3d4]

/lib/amd64/libproc.so.1'proc_walk+0x2d6 [0xfffffd7fecbb4f46]

/usr/bin/htop'ProcessList_goThroughEntries+0x7d7 [0x420157]

/usr/bin/htop'ProcessList_scan+0x4d [0x41774d]

/usr/bin/htop'ScreenManager_run+0xf1 [0x418301]

/usr/bin/htop'main+0x402 [0x412762]

/usr/bin/htop'_start_crt+0x83 [0x40e3b3]

/usr/bin/htop'_start+0x18 [0x40e318]

Abort (core dumped)

Running mdb on htop core says:

newman  ~  mdb core

Loading modules: [ libc.so.1 libproc.so.1 ld.so.1 ]

::status

debugging core file of htop (64-bit) from lenovo

file: /usr/bin/htop

initial argv: htop

threading model: native threads

status: process terminated by SIGABRT (Abort), pid=8257 uid=101 code=-1

$C

fffffd7fffdfe910 libc.so.1`_lwp_kill+0xa()

fffffd7fffdfe940 libc.so.1`raise+0x1e(6)

fffffd7fffdfe990 libc.so.1`abort+0x58()

fffffd7fffdfe9b0 0x420346()

fffffd7fffdfe9c0 libc.so.1`__sighndlr+6()

fffffd7fffdfea60 libc.so.1`call_user_handler+0x1db(b, 0, fffffd7fffdfead0)

fffffd7fffdfeab0 libc.so.1`sigacthandler+0xee(b, 0, fffffd7fffdfead0)

fffffd7fffdfeee0 libc.so.1`strlen+0x14()

fffffd7fffdfeef0 xStrdup+9()

fffffd7fffdfefa0 SolarisProcessList_walkproc+0x3f4()

fffffd7fffdff600 libproc.so.1`proc_walk+0x2d6(41efe0, 4a6a60, 1)

fffffd7fffdff6a0 ProcessList_goThroughEntries+0x7d7()

fffffd7fffdff6d0 ProcessList_scan+0x4d()

fffffd7fffdff770 ScreenManager_run+0xf1()

fffffd7fffdff820 main+0x402()

fffffd7fffdff850 _start_crt+0x83()

fffffd7fffdff860 _start+0x18()

objdump is attached: htop.objdump.txt https://github.com/hishamhm/htop/files/3999114/htop.objdump.txt.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hishamhm/htop/issues/972?email_source=notifications&email_token=AANHROHVXXY3CT76HLV6EGDQ2KCOBA5CNFSM4J7A7M3KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ICRRQEQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANHROHRXRBM2GQYJMKAEF3Q2KCOBANCNFSM4J7A7M3A .

ghost commented 4 years ago

@mno-hime - please try the attached patch and let me know if that does the trick. If so, I'll commit it as -p4.

UPDATE: GitHub doesn't like .diff attachments. See my branch at https://github.com/ninefathom/htop/tree/220-sunos_11-fix_strdup instead.

Mno-hime commented 4 years ago

@ninefathom Still fails, here's the backtrace:

htop

htop 2.2.0 aborting. Please report bug at http://hisham.hm/htop

 Please include in your report the following backtrace: 
/usr/bin/htop'CRT_handleSIGSEGV+0x5b [0x4203db]
/lib/amd64/libc.so.1'__sighndlr+0x6 [0xfffffd7fef283566]
/lib/amd64/libc.so.1'call_user_handler+0x1db [0xfffffd7fef27644b]
/lib/amd64/libc.so.1'strlen+0x14 [0xfffffd7fef1e9fd4]
/usr/bin/htop'xStrdup+0x9 [0x41e6b9]
/usr/bin/htop'SolarisProcessList_walkproc+0x5ea [0x41f6ba]
/lib/amd64/libproc.so.1'proc_walk+0x2d6 [0xfffffd7feeed4f46]
/usr/bin/htop'ProcessList_goThroughEntries+0x7d7 [0x420277]
/usr/bin/htop'ProcessList_scan+0x4d [0x41783d]
/usr/bin/htop'ScreenManager_run+0xf1 [0x4183f1]
/usr/bin/htop'main+0x402 [0x412852]
/usr/bin/htop'_start_crt+0x83 [0x40e4a3]
/usr/bin/htop'_start+0x18 [0x40e408]

Abort (core dumped)

htop.objdump.txt

ghost commented 4 years ago

@Mno-hime can you confirm that you've got everything through my commit 4e791ac from 220-sunos_11-fix_strdup in there?

Mno-hime commented 4 years ago

@ninefathom I took four topmost commits from https://github.com/ninefathom/htop/commits/220-sunos_11-fix_strdup and patched them on top of https://github.com/ninefathom/htop/releases/tag/220-sunos_11-p3. Did I missed any other?

ghost commented 4 years ago

That should do it. Color me officially puzzled. The fact that a backtrace is being printed at all, suggests that either A) the issue is not occurring at the only place xStrdup is called from SolarisProcessList_walkproc() [which disagrees with the backtrace content], or B) process-global variables in C are not getting set properly [which is tough to believe].

I'm stumped for the moment. I'll take some time to think.

For your reference, in brief what is happening is that htop is attempting to copy the process command name into the list, after the process has already gone away. There's no way to lock the process info structure w/o being root or process owner- kind of defeats the point of htop- so instead I've tried to work around it by basically ignoring the SEGV and setting the process name to empty if that situation arises. But, alas, it is not working.

ghost commented 4 years ago

@Mno-hime can you run the patched htop w/ mdb, and (after abort) let me know the value of the variable "protected_str_read"?

mdb: stop on SIGSEGV
mdb: target stopped at:
libc.so.1`__pollsys+0xa:jb    -0xabd20 <libc.so.1`__cerror>
> protected_str_read/D
htop'protected_str_read:
htop'protected_str_read:    ==integer value here==
>
ghost commented 4 years ago

@Mno-hime also, just in case, there's a new commit on 220-sunos_11-fix_strdup in the event that GCC is inlining SolarisProcessList_readZoneName into SolarisProcessList_walkproc and the xStrdup that's segfaulting is actually the one reading the zone name.

Mno-hime commented 4 years ago

htop'protected_str_read: ==integer value here==

The four-patch htop shows zero.

The five-patch htop stands several zone restarts :).

ghost commented 4 years ago

Thanks. That means that GCC was, in fact, inlining SolarisProcessList_readZoneName into SolarisProcessList_walkproc. I've pushed and tagged the fixes to my solaris-stable branch as 220-sunos_11-p4 .

Mno-hime commented 4 years ago

Thanks @ninefathom! I close this as I don't expect this to be acted upon by the htop maintainer.