karlheyes / icecast-kh

KH branch of icecast
GNU General Public License v2.0
298 stars 106 forks source link

100% cpu usage and freeze #254

Open hmorneau opened 5 years ago

hmorneau commented 5 years ago

I'm running the latest version KH12, and I have a problem where the process seems to crash randomly. Icecast stop responding and the cpu usage for the process goes up to 100%. I don't see anything particular in the log file.

karlheyes commented 5 years ago

does the latest master build have the same issue?

karl.

hmorneau commented 5 years ago

I haven't tried, I run "stable" release since this is in production. So far seems to happen every 5 to 7 days. I guess I will clone your repo and compile to see if it helps.

hmorneau commented 5 years ago

OK, so the server recrashed twice (after I opened this issue) in as many day. So I pulled the git and compiled, since then, no crash. Seems like there is an issue in KH12. I see #255 reporting a crash as well, not sure if it's related. Mine had nothing in the log.

hmorneau commented 5 years ago

It crashed again. Same thing, CPU stuck to 100% on 1 core.

sinfrecu commented 5 years ago

I think the same thing is happening to me, icecast never fails and down twice in 48 hours

Icecast 2.4.0-kh12 on ubuntu 18.04 LTS of Digital Ocean droplet

kjwierenga commented 5 years ago

Same for me today (brought kh12 into production today). Icecast kh12 hanging with 100% CPU usage and file descriptor usage increasing by the minute. Memory consumption is normal, doesn't look like a memory leak. Logging to error.log stopped at the point where CPU went to 100%.

Running on Linux 4.4.0-151-generic #178-Ubuntu SMP Tue Jun 11 08:30:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

I'm trying to get a core dump with gcore now.

kjwierenga commented 5 years ago

Failed to get a core dump. Eventually icecast (pid=1222) exhausted file descriptors and is still stuck with 100% CPU usage.

/proc/1222/fd# ls -1 | wc -l
10000
AxisNL commented 4 years ago

I think I have the same issue with Icecast 2.4.0-kh12-20191126014548. Completely hangs. I haven't seen this issue myself, one of my colleagues did.. Any news here? Any way we can help in reproducing the issue? We have hundreds of clients in production, so I cannot just run debug stuff though.

karlheyes commented 4 years ago

latest master fixes all known issues, if kh13 is ok for you then fixed already. There is a post kh13 fix if you use on-connect tags.

karl.

Swiftgti commented 4 years ago

Hi Karl i have kh14 and it has also has this same bug. it goes to 100% cpu icecast process and obviously icecast stops responding. we confirmed that this happens only when source is terminated or disconnected for any network issue by problems of the source internet connection, also you must have some hundred clients connected to the mount point that the source was disconnected. and you will get this 100% cpu issue. Not always happens this issue when source disconnect but eventually it goes to 100%, This bug remains in kh14, and only happens when source is disconnected.

any idea? do you recommend me to go to kh11 or older?

karlheyes commented 4 years ago

I can try some trial runs of the setup you mentioned but I could do with knowing if there are any mount specific aspects like fallback or auth that may apply on source disconnection. A cpu maxing would be a loop but if you are not seeing anything in the logs then you could try an

strace -o output -ff -tt -p pidof icecast

ctrl-C after a short time, see if any of those output.* files is very large. The other thing to do is grabbing a few core files (might need more than one if it's hard to trap) with gcore pidof icecast, but make sure you have a 'make debug' build. It's possible that your setup can trip a bug that is difficult for others but I'll try a test run here.

karl.

Swiftgti commented 4 years ago

Hi Karl in less than 24 hs, again icecast get freeze at 100% cpu. This is my icecast.xml configuration the XXX is the replacement of sensitive information, regarding that, is the exact configuration. I am attaching now the configuration.

icecast.xml.txt

karlheyes commented 4 years ago

thanks, so just max-listeners on each mount. Did you check the error log and strace outputs?

karl.

Swiftgti commented 4 years ago

Hi Karl, they are safe values on max-listeners, but yes we are hitting al days 12.000 simultaneous listeners.

In the error log there is nothing, I will put it in debug mode. and we don't have strace outputs because is a production server and we restart it as fast as we can and also its not a debug build.

karlheyes commented 4 years ago

the strace does not require a debug build, as it measures what system calls are being used, and it will only require a few seconds of sample. If it is a close loop then it would not show anything but if it's bouncing along a workers clients then it will. A core file is best with a debug build which might be more of a problem getting but dropping the core and restarting is not a bigg issue if you get to that point. I've not yet caused this locally yet but still checking.

karl.

karlheyes commented 4 years ago

minor query really, what would you say the balance is between ssl/non-ssl listeners

karl.