Closed DuncanFairley closed 4 years ago
yes, that pattern would indicate an FD leak. The changes between those 2 versions are quite large to use as a reference so we'll have to identiy what the trigger is. The lsof -nPp ... should provide a big list and it will probably be a steady leak so check it periodically. You should find a similar type of entry getting repeated and it is proably a socket rather that a file but always good to check.
Check the access log for anything unusual, ie not a 200 OK type message or something short or some request that is unexpected. Also check the /admin/stats page, the global clients and listeners counts may also be increasing. If a source listeners count is too high then that may indicate something.
karl
I'm on the latest commit now, and I don't seem to get the freeze anymore, but I get a shutdown instead. Nothing is written in the error log when it happens, (level 2), nor anything in syslog. No /tmp/core either. It happens somewhat randomly:
26/10/2019 07:51:17 AM Icecast restarted
27/10/2019 01:43:53 PM Icecast restarted
31/10/2019 08:23:01 AM Icecast restarted
01/11/2019 12:19:52 AM Icecast restarted
01/11/2019 10:07:57 AM Icecast restarted
03/11/2019 01:46:24 PM Icecast restarted
03/11/2019 11:00:45 PM Icecast restarted
04/11/2019 09:44:37 AM Icecast restarted
Lately I've been seeing some weird traffic like this:
Redacted - - [04/Nov/2019:12:24:29 -0800] "GET /CFRI HTTP/1.0" 200 71811 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
Redacted - - [04/Nov/2019:12:24:30 -0800] "GET /CFRI HTTP/1.0" 200 73259 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
Redacted - - [04/Nov/2019:12:24:31 -0800] "GET /CFRI HTTP/1.0" 200 80052 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
Redacted - - [04/Nov/2019:12:24:32 -0800] "GET /CFRI HTTP/1.0" 200 73259 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
Redacted - - [04/Nov/2019:12:24:33 -0800] "GET /CFRI HTTP/1.0" 200 73259 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
Redacted - - [04/Nov/2019:12:24:34 -0800] "GET /CFRI HTTP/1.0" 200 73259 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
Redacted - - [04/Nov/2019:12:24:35 -0800] "GET /CFRI HTTP/1.0" 200 71811 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
Redacted - - [04/Nov/2019:12:24:36 -0800] "GET /CFRI HTTP/1.0" 200 73259 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.2; SM-T210L Build/JZO54K)" 1
All from the same IP. I see some similar useragents doing the same thing occasionally. Is it fine for clients to be issuing a request per second?
This looks a little concerning:
[2019-11-04 19:56:25] INFO source/listener_check_intro removing 208.101.90.34 from intro list
[2019-11-04 19:56:25] INFO source/listener_check_intro removing 199.247.209.40 from intro list
[2019-11-04 19:56:25] INFO source/listener_check_intro removing 192.222.224.32 from intro list
[2019-11-04 19:56:25] INFO source/listener_check_intro removing 204.40.194.133 from intro list
[2019-11-04 19:56:25] INFO source/listener_check_intro removing <80>▒T̿^? from intro list
[2019-11-04 19:56:25] INFO source/listener_check_intro removing 88.99.95.58 from intro list
[2019-11-04 19:56:25] INFO source/listener_check_intro removing from intro list
[2019-11-04 19:56:25] INFO source/listener_check_intro removing 95.216.25.161 from intro list
[2019-11-04 19:56:25] INFO source/listener_check_intro removing from intro list
[2019-11-04 19:56:25] INFO source/source_add_listener max on /CKAP is -1 (cur 6)
can you check kh13 with this when you can. there are corruption fixes which may show this latter case up and FD leaks can appear if certain internals get stuck.
karl.
Hey Karl, commit aad507e appears to have fixed both my hang and crash problems. Thanks!
Since upgrading from 2.3.3-kh11 to 2.4.0-kh10 (and the releases after that), I've had an issue where I'm running out of file descriptors intermittently.
It'll usually happen between two and five days. Rarely, it'll take a week or two. I haven't been able to narrow down how it occurs. Currently Icecast has been running for 12 hours.
It currently has 468 listeners.
There must be a huge jump in open files at some point, because it doesn't seem anywhere close to 300k. When it happens, Icecast stops serving audio and will essentially be frozen until I
kill -9
it. On startup, it correctly saysWARN slave/slave_startup process has 30000 max file descriptor limit
.When the freeze occurs, it says:
WARN connection/accept_client accept() failed with error 24: Too many open files
repeatedly.Most of my streams are mp3 - some are AAC. There's about 50 streams in total. Maybe 10 of them have introaudio. No fallbacks.
Currently running https://github.com/karlheyes/icecast-kh/commit/bde700e58e22b1c0582bbdafd1151da17be788de
Edit: When it next happens, I'll get an output of
lsof -np $(pidof icecast)
here.