Closed DuncanFairley closed 4 years ago
Switched to new icecast server form 2.4.0 kh3 ( on new hardware , Debian 10 x64 ).
I've got segfault within 24 hour uptime with 2.4.0 kh12 with commit 96a500e applied. Configuration has quite a lot of relays defined ( about 60 from 108 sources, both local and external ).
Sep 1 17:41:29 stream kernel: [182496.651456] icecast2[13249]: segfault at 0 ip 0000000000000000 sp 00007efc7e9ec978 error 14 in icecast[ 559141ad7000+6000] Sep 1 17:41:29 stream kernel: [182496.651469] Code: Bad RIP value.
( Hardware is HP server with ECC memory, should not be an issue )
I did end up applying 96a500e but still get the segfaults every 1-3 days.
Aug 25 15:29:50 srv1 kernel: [67760965.090029] icecast[26389]: segfault at 31 ip 0000000000415481 sp 00007f2de8f73d90 error 4 in icecast[400000+55000]
I don't have any relays though. Just ~50 mountpoints.
I ended up having segfaults every 10 minutes... Installed old version I was running previously. Will see how it goes. Do You use fallbacks?
I don't use any fallbacks. I do some introaudio though. But if you're able to reproduce it in only 10 minutes, you could try getting a trace so Karl can see exactly where it's faulting. See here
Downgraded from 12 to 11 now we're getting segfaults on KH11 as well. Trying KH10 now....
I can't seem to get a debug build working. In a fresh icecast copy, I did:
./configure
make debug
make install
When it crashed, it outputted this:
*** Error in `/usr/local/bin/icecast': malloc(): smallbin double linked list corrupted: 0x00007fbfcc99c280 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fbfdf64e7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x82651)[0x7fbfdf659651]
/lib/x86_64-linux-gnu/libc.so.6(__libc_calloc+0xba)[0x7fbfdf65bdca]
/usr/local/bin/icecast[0x41dd37]
/usr/local/bin/icecast[0x4297ea]
/usr/local/bin/icecast[0x42b689]
/usr/local/bin/icecast[0x416f09]
/usr/local/bin/icecast[0x4180dc]
/usr/local/bin/icecast[0x41f135]
/usr/local/bin/icecast[0x43adeb]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fbfdf9a86ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fbfdf6de41d]
And gdb:
root@srv1:/tmp# gdb /usr/local/bin/icecast --core core-icecast.9892.srv1.1572983181
...
Reading symbols from /usr/local/bin/icecast...done.
[New LWP 9896]
[New LWP 8773]
[New LWP 9892]
[New LWP 9895]
[New LWP 9894]
Core was generated by `/usr/local/bin/icecast -c /home/icecast/currentconfig/icecast.xml'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fbfdf60c428 in ?? ()
[Current thread is 1 (LWP 9896)]
(gdb) bt
#0 0x00007fbfdf60c428 in ?? ()
#1 0x00007fbfdf60e02a in ?? ()
#2 0x0000000000000020 in ?? ()
#3 0x0000000000000000 in ?? ()
I finally got a proper trace, I hope:
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/bin/icecast -c /home/icecast/currentconfig/icecast.xml'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f9bac43e428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f9ba47a7700 (LWP 13549))]
(gdb) bt
#0 0x00007f9bac43e428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007f9bac44002a in __GI_abort () at abort.c:89
#2 0x0000000000419fa6 in source_queue_advance (client=0x7f9b90002f90) at source.c:883
#3 0x000000000041b487 in send_listener (source=0x7f9b981b0f90, client=0x7f9b90002f90) at source.c:1383
#4 0x000000000041ae5f in send_to_listener (client=0x7f9b90002f90) at source.c:1232
#5 0x0000000000425fc6 in worker (arg=0x157b580) at client.c:735
#6 0x000000000044ba1b in _start_routine (arg=0x157b370) at thread.c:727
#7 0x00007f9bac7da6ba in start_thread (arg=0x7f9ba47a7700) at pthread_create.c:333
#8 0x00007f9bac51041d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) bt full
#0 0x00007f9bac43e428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
resultvar = 0
pid = 13545
selftid = 13549
#1 0x00007f9bac44002a in __GI_abort () at abort.c:89
save_stage = 2
act = {__sigaction_handler = {sa_handler = 0xb19be18fcb503e00, sa_sigaction = 0xb19be18fcb503e00}, sa_mask = {__val = {
11952406549267936580, 7632343802151250176, 14024600428130476874, 16319518678535185408, 6825180314748858317, 1254,
140306249466032, 140306338231856, 140306344117392, 0, 140306314601760, 0, 22320800, 140306249040096, 1635678190688592640,
140306456145296}}, sa_flags = 4414744, sa_restorer = 0x0}
sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x0000000000419fa6 in source_queue_advance (client=0x7f9b90002f90) at source.c:883
ret = 1254
offset = 130 '\202'
written = 1254
source = 0x7f9b981b0f90
refbuf = 0x7f9b9c5295d0
lag = 1254
__func__ = "source_queue_advance"
loop = 47
#3 0x000000000041b487 in send_listener (source=0x7f9b981b0f90, client=0x7f9b90002f90) at source.c:1383
bytes = 0
loop = 40
total_written = 0
limiter = 8762
ret = 0
lag = 1254
worker = 0x157b580
now = 1573089315
__func__ = "send_listener"
#4 0x000000000041ae5f in send_to_listener (client=0x7f9b90002f90) at source.c:1232
source = 0x7f9b981b0f90
ret = 0
#5 0x0000000000425fc6 in worker (arg=0x157b580) at client.c:735
ret = 0
nx = 0x7f9b90017970
process = 1
client = 0x7f9b90002f90
sched_ms = 1573089315529
worker = 0x157b580
prev_count = 1120
prevp = 0x7f9b90004ea8
c = 7
__func__ = "worker"
Edit: I'm running latest icecast commit eb66e8f. Ubuntu 16.04.
<queue-size>524288</queue-size>
<client-timeout>30</client-timeout>
<header-timeout>15</header-timeout>
<source-timeout>6</source-timeout>
<burst-size>231072</burst-size>
(gdb) p source->mount
$1 = 0x7f9b981ba730 "/CFFM"
#This is a MP3 stream
(gdb) p client->connection
$6 = {id = 555, con_time = 1572983193, discon = {time = 0, offset = 0}, sent_bytes = 1273725733, sock = 179, chunk_pos = 0, error = 0 '\000',
readchk = 108 'l', ssl = 0x0, ip = 0x7f9b900182a0 "redacted IP"}
The person with that IP is currently listening too, it seems. Is that con_time in milliseconds?
It looks like their audio player is very aggressive in reconnecting when the stream is down:
[06/Nov/2019:17:15:25 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:25 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:25 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:25 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:25 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:25 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:25 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:26 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:26 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:26 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:26 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:26 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:26 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:26 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:26 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:26 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:26 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:26 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:26 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:27 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:27 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:27 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:27 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:27 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:27 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:27 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
[06/Nov/2019:17:15:27 -0800] "GET /CFFM HTTP/1.0" 404 135 "-" "mpg123/1.25.7" 0
I'm also facing this issue. Is there anything I can do to help debug this?
I got a new segfault:
(gdb) core-file core-icecast.9463.srv1.1573479075
[New LWP 9467]
[New LWP 4364]
[New LWP 9463]
[New LWP 9466]
[New LWP 9465]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/bin/icecast -c /home/icecast/currentconfig/icecast.xml'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007ffb2507c428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7ffb1d3e5700 (LWP 9467))]
(gdb) bt
#0 0x00007ffb2507c428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffb2507e02a in __GI_abort () at abort.c:89
#2 0x00007ffb250be7ea in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffb251d7ed8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007ffb250c4814 in malloc_printerr (action=<optimized out>, str=0x7ffb251d4c75 "corrupted size vs. prev_size", ptr=<optimized out>, ar_ptr=<optimized out>) at malloc.c:5006
#4 0x00007ffb250c9a03 in _int_malloc (av=av@entry=0x7ffb14000020, bytes=bytes@entry=1400) at malloc.c:3622
#5 0x00007ffb250cb184 in __GI___libc_malloc (bytes=1400) at malloc.c:2913
#6 0x000000000042423a in refbuf_new (size=1400) at refbuf.c:75
#7 0x0000000000436577 in validate_mpeg (source=0x7ffb10050810, refbuf=0x7ffb1406ea40) at format_mp3.c:946
#8 0x0000000000436687 in mp3_get_no_meta (source=0x7ffb10050810) at format_mp3.c:971
#9 0x0000000000418eaa in source_read (source=0x7ffb10050810) at source.c:575
#10 0x00000000004196d7 in source_client_read (client=0x7ffb08011640) at source.c:734
#11 0x0000000000425fc6 in worker (arg=0xaa9580) at client.c:735
#12 0x000000000044ba1b in _start_routine (arg=0xaa8c80) at thread.c:727
#13 0x00007ffb254186ba in start_thread (arg=0x7ffb1d3e5700) at pthread_create.c:333
#14 0x00007ffb2514e41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
I'm happy to send the corefile+binary if that'd help diagnose these @karlheyes
appologies for lack of responses, been travelling. The corruption cases that some have reported should be resolved now. A race between queue being pruned and a listener disconnecting was causing possible corruption leading to random effects. I've released kh13 with the fix.
karl
I'm on the release version of 2.4.0-kh12 and received two segfaults since updating a week ago:
Jun 25 05:59:42 srv1 kernel: [62456356.991999] icecast[16253]: segfault at 2dd ip 0000000000430220 sp 00007f7cafe22be8 error 4 in icecast[400000+55000] Jun 22 08:15:50 srv1 kernel: [62205324.731029] icecast[24053]: segfault at 0 ip (null) sp 00007f6ff5394d98 error 14 in icecast[400000+55000]
I haven't got a coredump yet, but am just wondering if these seem related to what was fixed in https://github.com/karlheyes/icecast-kh/commit/96a500ea466c09429c4db5d662a9fb6f7dae69e2 ?