karlheyes / icecast-kh

KH branch of icecast
GNU General Public License v2.0
298 stars 106 forks source link

icecast kh12 and latest get unresponsive #275

Closed niko closed 4 years ago

niko commented 4 years ago

We're experiencing a weird issue with the latest KH builds: Icecast gets unresponsive (HTTP requests just hang there until they timeout) after some time. These are the symptoms:

We're running Debian Buster Docker containers on a Ubuntu 18.04 host. I have tried quite some of the last commits and all show this issue. I have also tried Debian Jessie containers and Ubuntu 16.04. Same issue.

Any hint on the issue itself or even how to gather further information would be highly appreciated.

niko commented 4 years ago

Meanwhile I found out that both symptoms (icecast hanging and multiple spawned icecast processes) occur independently from each other: sometimes icecast hangs and there are no additional processes, sometime there are just additional processes (and the initial process is still responding; I can then kill -9 (simple kill doesn't work) the additional processes without further effect) and sometimes it's both: icecast not responding and additional processes.

Sometime there are so many additional icecast processes that the server runs out of RAM and just crashes.

I'm running icecast in the background at the moment with the -b-flag. Earlier I had icecast running still attached to its start script. In case of more processes being spawned the first process had detached from its start script.

niko commented 4 years ago

Now I could just witness the spawning thing in realtime. I have crude loop running which will kill excessive icecast instances:

$ while true; do date ; ps faux | egrep '/usr/local/bin/[i]cecast' | grep -v 11037 | awk '{print $2}' | while read pid; do echo $pid; kill -9 $pid; done; sleep 1; done

Mon 27 Jan 2020 08:23:20 PM CET
23179
Mon 27 Jan 2020 08:23:21 PM CET
Mon 27 Jan 2020 08:23:22 PM CET
23280
Mon 27 Jan 2020 08:23:23 PM CET
Mon 27 Jan 2020 08:23:24 PM CET
Mon 27 Jan 2020 08:23:25 PM CET
Mon 27 Jan 2020 08:23:26 PM CET
Mon 27 Jan 2020 08:23:27 PM CET
Mon 27 Jan 2020 08:23:28 PM CET
23641
Mon 27 Jan 2020 08:23:29 PM CET
23768
Mon 27 Jan 2020 08:23:30 PM CET
Mon 27 Jan 2020 08:23:31 PM CET

At the same time I'm looking at the debug error.log and see nothing suspicious. Also nothing special in the playlist.log, so it doesn't seem related to the metadata.

niko commented 4 years ago

I found out yesterday evening that at least the process spawning thing is probably triggered by a source client constantly connecting and disconnecting. Here an excerpt of a single stations debug error.log:

[2020-01-27  19:46:01] INFO admin/command_metadata Metadata song on /radio set to "Michael Klein-Luttmer - HIT COUNTDOWN"
[2020-01-27  19:46:01] INFO source/source_read End of Stream /radio
[2020-01-27  19:46:01] INFO source/source_shutdown Source "/radio" exiting
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" outgoing_kbitrate (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" incoming_bitrate (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" total_bytes_read (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" total_bytes_sent (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" total_mbytes_sent (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" queue_size (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" connected (1)
[2020-01-27  19:46:01] INFO source/source_set_fallback fallback on /radio to /elbe-radio.autodj, but no listeners
[2020-01-27  19:46:01] DBUG source/drop_source_from_tree stats still referenced on /radio
[2020-01-27  19:46:01] DBUG stats/_free_source_stats delete source node /radio
[2020-01-27  19:46:01] DBUG source/drop_source_from_tree removed source /radio (0x7f1033def940) from tree
[2020-01-27  19:46:01] INFO source/source_client_read no more listeners on /radio
[2020-01-27  19:46:01] DBUG source/source_clear_source clearing source "/radio" 0x7f1033def940
[2020-01-27  19:46:01] INFO source/_free_source freeing source "/radio"
[2020-01-27  19:46:01] INFO connection/_handle_source_request Source logging in at mountpoint "/radio"
[2020-01-27  19:46:01] DBUG source/source_format_init parser found for /radio
[2020-01-27  19:46:01] INFO format-mp3/format_mp3_get_plugin Created format details for /radio
[2020-01-27  19:46:01] DBUG stats/stats_handle new source stat /radio
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "listenurl" (http://5.stream:8080/elbe-radio)
[2020-01-27  19:46:01] INFO source/source_apply_mount Applying mount information for "/radio"
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "listener_peak" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "public" (1)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "server_name" (ELBE-Radio)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "server_description" (Die Musik Deiner Jugend)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "server_url" (http://ELBE-Radio.de)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "genre" (Oldies)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "bitrate" (128)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "server_type" (audio/mpeg)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "authenticator" (url)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "max_listeners" (-1)
[2020-01-27  19:46:01] INFO source/source_startup source /radio is ready to start
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "slow_listeners" (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" server_type (audio/mpeg)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" listener_peak (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" listener_peak (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "listener_connections" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "stream_start" (27/Jan/2020:19:46:01 +0100)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "total_mbytes_sent" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "total_bytes_sent" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "total_bytes_read" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "outgoing_kbitrate" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "incoming_bitrate" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "queue_size" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "connected" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "source_ip" (85.214.74.65)
[2020-01-27  19:46:01] INFO source/source_init Source /radio initialised
[2020-01-27  19:46:01] INFO source/source_read listener count on /radio now 0
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "listeners" (0)
[2020-01-27  19:46:01] INFO source/source_read End of Stream /radio
[2020-01-27  19:46:01] INFO source/source_shutdown Source "/radio" exiting
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" outgoing_kbitrate (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" incoming_bitrate (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" total_bytes_read (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" total_bytes_sent (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" total_mbytes_sent (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" queue_size (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" connected (0)
[2020-01-27  19:46:01] INFO source/source_set_fallback fallback on /radio to /elbe-radio.autodj, but no listeners
[2020-01-27  19:46:01] DBUG source/drop_source_from_tree stats still referenced on /radio
[2020-01-27  19:46:01] DBUG stats/_free_source_stats delete source node /radio
[2020-01-27  19:46:01] DBUG source/drop_source_from_tree removed source /radio (0x7f0fd8d82200) from tree
[2020-01-27  19:46:01] INFO source/source_client_read no more listeners on /radio
[2020-01-27  19:46:01] DBUG source/source_clear_source clearing source "/radio" 0x7f0fd8d82200
[2020-01-27  19:46:01] INFO source/_free_source freeing source "/radio"
[2020-01-27  19:46:01] WARN admin/admin_mount_request Admin command on non-existent source /radio
[2020-01-27  19:46:01] INFO connection/_handle_source_request Source logging in at mountpoint "/radio"
[2020-01-27  19:46:01] DBUG source/source_format_init parser found for /radio
[2020-01-27  19:46:01] INFO format-mp3/format_mp3_get_plugin Created format details for /radio
[2020-01-27  19:46:01] DBUG stats/stats_handle new source stat /radio
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "listenurl" (http://5.stream:8080/elbe-radio)
[2020-01-27  19:46:01] INFO source/source_apply_mount Applying mount information for "/radio"
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "listener_peak" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "public" (1)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "server_name" (ELBE-Radio)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "server_description" (Die Musik Deiner Jugend)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "server_url" (http://ELBE-Radio.de)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "genre" (Oldies)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "bitrate" (128)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "server_type" (audio/mpeg)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "authenticator" (url)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "max_listeners" (-1)
[2020-01-27  19:46:01] INFO source/source_startup source /radio is ready to start
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "slow_listeners" (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" server_type (audio/mpeg)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" listener_peak (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" listener_peak (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "listener_connections" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "stream_start" (27/Jan/2020:19:46:01 +0100)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "total_mbytes_sent" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "total_bytes_sent" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "total_bytes_read" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "outgoing_kbitrate" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "incoming_bitrate" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "queue_size" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "connected" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "source_ip" (85.214.74.65)
[2020-01-27  19:46:01] INFO source/source_init Source /radio initialised
[2020-01-27  19:46:01] INFO source/source_read listener count on /radio now 0
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "listeners" (0)
[2020-01-27  19:46:01] INFO admin/command_metadata Metadata song on /radio set to "Michael Klein-Luttmer - HIT COUNTDOWN"
[2020-01-27  19:46:01] INFO source/source_read End of Stream /radio
[2020-01-27  19:46:01] INFO source/source_shutdown Source "/radio" exiting
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" outgoing_kbitrate (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" incoming_bitrate (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" total_bytes_read (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" total_bytes_sent (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" total_mbytes_sent (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" queue_size (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" connected (0)
[2020-01-27  19:46:01] INFO source/source_set_fallback fallback on /radio to /elbe-radio.autodj, but no listeners
[2020-01-27  19:46:01] DBUG source/drop_source_from_tree stats still referenced on /radio
[2020-01-27  19:46:01] DBUG stats/_free_source_stats delete source node /radio
[2020-01-27  19:46:01] DBUG source/drop_source_from_tree removed source /radio (0x7f103004f310) from tree
[2020-01-27  19:46:01] INFO source/source_client_read no more listeners on /radio
[2020-01-27  19:46:01] DBUG source/source_clear_source clearing source "/radio" 0x7f103004f310
[2020-01-27  19:46:01] INFO source/_free_source freeing source "/radio"
[2020-01-27  19:46:01] INFO connection/_handle_source_request Source logging in at mountpoint "/radio"
[2020-01-27  19:46:01] DBUG source/source_format_init parser found for /radio
[2020-01-27  19:46:01] INFO format-mp3/format_mp3_get_plugin Created format details for /radio
[2020-01-27  19:46:01] DBUG stats/stats_handle new source stat /radio
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "listenurl" (http://5.stream:8080/elbe-radio)
[2020-01-27  19:46:01] INFO source/source_apply_mount Applying mount information for "/radio"
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "listener_peak" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "public" (1)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "server_name" (ELBE-Radio)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "server_description" (Die Musik Deiner Jugend)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "server_url" (http://ELBE-Radio.de)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "genre" (Oldies)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "bitrate" (128)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "server_type" (audio/mpeg)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "authenticator" (url)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "max_listeners" (-1)
[2020-01-27  19:46:01] INFO source/source_startup source /radio is ready to start
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "slow_listeners" (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" server_type (audio/mpeg)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" listener_peak (0)
[2020-01-27  19:46:01] DBUG stats/modify_node_event update "/radio" listener_peak (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "listener_connections" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "stream_start" (27/Jan/2020:19:46:01 +0100)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "total_mbytes_sent" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "total_bytes_sent" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "total_bytes_read" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "outgoing_kbitrate" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "incoming_bitrate" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "queue_size" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "connected" (0)
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "source_ip" (85.214.74.65)
[2020-01-27  19:46:01] INFO source/source_init Source /radio initialised
[2020-01-27  19:46:01] INFO source/source_read listener count on /radio now 0
[2020-01-27  19:46:01] DBUG stats/process_source_stat new node on /radio "listeners" (0)
[2020-01-27  19:46:02] INFO admin/command_metadata Metadata song on /radio set to "Michael Klein-Luttmer - HIT COUNTDOWN"
[2020-01-27  19:46:02] INFO source/source_read End of Stream /radio
[2020-01-27  19:46:02] INFO source/source_shutdown Source "/radio" exiting

When this source client disconnected the spawning of new Icecast instances mostly stopped.

My guess is that this happens if the source client does not disconnect properly (perhaps interacting with some other issue) and these multiple connects/disconnects per seconds just make it obvious.

karlheyes commented 4 years ago

Some threads are short lived so don't be suprised if those appear periodically. They tend to be limited in some way like directory updates or auth. I have fixed a few issues that can lead to hangs or memory corruption in kh13. Let me know if that helps.

karl.

niko commented 4 years ago

I'm deploying 2c2dcb2dee3780d506df5b3a299ec677031 right now and will report back. Thanks!

Niko.

niko commented 4 years ago

Bad news: It took about two hours until Icecast hanged again. No additional processes spawned though.

There is nothing obvious in the error.log pointing at what went wrong. Within 3 seconds all ~ 1200 mounts log Nothing received on /$MOUNT for 3 seconds.

Is there anything I can do to further diagnose that?

karlheyes commented 4 years ago

If it is a hang then make debug and on a hang, do a gcore to grab a core file for me to inspect. I'm not aware of a hang case. That message would tend to indicate that source clients are still being given time, just not seeing data. If there is some sort of lockup in that stage then all further attempts would stall.

niko commented 4 years ago

Hm. It won't crash when I'm waiting for it. :D

niko commented 4 years ago

Ok, it's hanging again and it forked another instance.

It still responds on one of the ports. Excerpt of the configuration:

    <listen-socket>
      <bind-address>$OUR_PUBLIC_IP</bind-address>
      <port>8080</port>
    </listen-socket>
    <listen-socket>
      <port>80</port>
      <bind-address>$OUR_PUBLIC_IP</bind-address>
    </listen-socket>
$ curl $OUR_PUBLIC_IP/admin.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
  <head>
    <link rel="stylesheet" type="text/css" href="/style.css" />
…

$ curl $OUR_PUBLIC_IP/admin.html
^C
[just hangs]

In addition to the external IPs icecast is also binding to an intern IP. The listener also hangs there.

These are the running processes inside the Docker container:

icecast    191 27.5  1.7 2104508 1154528 ?     Sl   Jan31 1979:34  \_ icecast -c /usr/local/share/icecast/etc/icecast.xml
icecast  23973  0.0  1.7 2104600 1150820 ?     S    Feb04   0:00 icecast -c /usr/local/share/icecast/etc/icecast.xml

… and outside the Docker container:

2501     18011 27.5  1.7 2104508 1154528 ?     Sl   Jan31 1979:35  |   |   |   \_ icecast -c /usr/local/share/icecast/etc/icecast.xml
2501      8250  0.0  1.7 2104600 1150820 ?     S    Feb04   0:00  |   |   \_ icecast -c /usr/local/share/icecast/etc/icecast.xml

Of course the second process couldn't bind on the ports and logged that to the error.log. Other than that I can see nothing suspicious in the log.

I'll send you an email with where you can download the binary and the core files. I guess core.18011 is the interesting one. Note that we're running Icecast in a Docker container. I had to obtain the core file from outside the container. PIDs inside and outside the container differ.

karlheyes commented 4 years ago

duplication of processes should only occur if something like auth cmd or on-connect type scripts occur. There was a lock race in such cases which could stall workers and is now fixed for kh14. Also an auth url stream auth issue was resolved. If there is a further issue then let me know.

karl