karlheyes / icecast-kh

KH branch of icecast
GNU General Public License v2.0
298 stars 107 forks source link

file descriptor leak? #265

Closed DuncanFairley closed 4 years ago

DuncanFairley commented 5 years ago

Since upgrading from 2.3.3-kh11 to 2.4.0-kh10 (and the releases after that), I've had an issue where I'm running out of file descriptors intermittently.

It'll usually happen between two and five days. Rarely, it'll take a week or two. I haven't been able to narrow down how it occurs. Currently Icecast has been running for 12 hours.

root@srv1:~# lsof -np $(pidof icecast) | wc -l
642

It currently has 468 listeners.

root@srv1:~# cat /proc/`pidof icecast`/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        30720000             30720000             bytes
Max resident set          unlimited            unlimited            bytes
Max processes             31784                31784                processes
Max open files            30000                30000                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       31784                31784                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

There must be a huge jump in open files at some point, because it doesn't seem anywhere close to 300k. When it happens, Icecast stops serving audio and will essentially be frozen until I kill -9 it. On startup, it correctly says WARN slave/slave_startup process has 30000 max file descriptor limit.

When the freeze occurs, it says: WARN connection/accept_client accept() failed with error 24: Too many open files repeatedly.

Most of my streams are mp3 - some are AAC. There's about 50 streams in total. Maybe 10 of them have introaudio. No fallbacks.

Currently running https://github.com/karlheyes/icecast-kh/commit/bde700e58e22b1c0582bbdafd1151da17be788de

Edit: When it next happens, I'll get an output of lsof -np $(pidof icecast) here.

karlheyes commented 5 years ago

yes, that pattern would indicate an FD leak. The changes between those 2 versions are quite large to use as a reference so we'll have to identiy what the trigger is. The lsof -nPp ... should provide a big list and it will probably be a steady leak so check it periodically. You should find a similar type of entry getting repeated and it is proably a socket rather that a file but always good to check.

Check the access log for anything unusual, ie not a 200 OK type message or something short or some request that is unexpected. Also check the /admin/stats page, the global clients and listeners counts may also be increasing. If a source listeners count is too high then that may indicate something.

karl

DuncanFairley commented 4 years ago

I'm on the latest commit now, and I don't seem to get the freeze anymore, but I get a shutdown instead. Nothing is written in the error log when it happens, (level 2), nor anything in syslog. No /tmp/core either. It happens somewhat randomly:

26/10/2019 07:51:17 AM Icecast restarted
27/10/2019 01:43:53 PM Icecast restarted
31/10/2019 08:23:01 AM Icecast restarted
01/11/2019 12:19:52 AM Icecast restarted
01/11/2019 10:07:57 AM Icecast restarted
03/11/2019 01:46:24 PM Icecast restarted
03/11/2019 11:00:45 PM Icecast restarted
04/11/2019 09:44:37 AM Icecast restarted

Lately I've been seeing some weird traffic like this:

Redacted - - [04/Nov/2019:12:24:29 -0800] "GET /CFRI HTTP/1.0" 200 71811 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
Redacted - - [04/Nov/2019:12:24:30 -0800] "GET /CFRI HTTP/1.0" 200 73259 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
Redacted - - [04/Nov/2019:12:24:31 -0800] "GET /CFRI HTTP/1.0" 200 80052 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
Redacted - - [04/Nov/2019:12:24:32 -0800] "GET /CFRI HTTP/1.0" 200 73259 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
Redacted - - [04/Nov/2019:12:24:33 -0800] "GET /CFRI HTTP/1.0" 200 73259 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
Redacted - - [04/Nov/2019:12:24:34 -0800] "GET /CFRI HTTP/1.0" 200 73259 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
Redacted - - [04/Nov/2019:12:24:35 -0800] "GET /CFRI HTTP/1.0" 200 71811 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
Redacted - - [04/Nov/2019:12:24:36 -0800] "GET /CFRI HTTP/1.0" 200 73259 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1

All from the same IP. I see some similar useragents doing the same thing occasionally. Is it fine for clients to be issuing a request per second?

DuncanFairley commented 4 years ago

This looks a little concerning:

[2019-11-04  19:56:25] INFO source/listener_check_intro removing 208.101.90.34 from intro list
[2019-11-04  19:56:25] INFO source/listener_check_intro removing 199.247.209.40 from intro list
[2019-11-04  19:56:25] INFO source/listener_check_intro removing 192.222.224.32 from intro list
[2019-11-04  19:56:25] INFO source/listener_check_intro removing 204.40.194.133 from intro list
[2019-11-04  19:56:25] INFO source/listener_check_intro removing <80>▒T̿^? from intro list
[2019-11-04  19:56:25] INFO source/listener_check_intro removing 88.99.95.58 from intro list
[2019-11-04  19:56:25] INFO source/listener_check_intro removing  from intro list
[2019-11-04  19:56:25] INFO source/listener_check_intro removing 95.216.25.161 from intro list
[2019-11-04  19:56:25] INFO source/listener_check_intro removing  from intro list
[2019-11-04  19:56:25] INFO source/source_add_listener max on /CKAP is -1 (cur 6)
karlheyes commented 4 years ago

can you check kh13 with this when you can. there are corruption fixes which may show this latter case up and FD leaks can appear if certain internals get stuck.

karl.

DuncanFairley commented 4 years ago

Hey Karl, commit aad507e appears to have fixed both my hang and crash problems. Thanks!