karlheyes / icecast-kh

KH branch of icecast
GNU General Public License v2.0
298 stars 107 forks source link

kh20: If there is no live stream, but there is a fallback mount, the server does not respond after reloading. #404

Open DroidU opened 1 year ago

DroidU commented 1 year ago

Reproduction:

  1. Configure a live mountpoint, but do not have a live stream on it. The fallback mount of the live stream should be a file uploaded to a web folder.
  2. Connect to the live mountpoint with a player. -> The fallback file is played.
  3. Reload server.
  4. The server will no longer respond.
karlheyes commented 1 year ago

a couple of other lock imbalances fixed. the master should be ok for that.

karl

DroidU commented 1 year ago

kh20.1: If there is 1 connection to the server, it is working fine. But for multiple parallel connections after reload it still does not respond.

karlheyes commented 1 year ago

did you take from the master tree or kh20.1

karl

DroidU commented 1 year ago

I downloaded it from: https://github.com/karlheyes/icecast-kh/tree/icecast-2.4.0-kh20.1 -> Code -> Download ZIP

karlheyes commented 1 year ago

ok, so no, try https://github.com/karlheyes/icecast-kh/archive/refs/heads/master.zip

DroidU commented 1 year ago

This is working fine.

gunsar commented 1 year ago

@karlheyes please rebuid release/update of new master for windows version, i will give it a try, i opened https://sourceforge.net/projects/icecastkh/ still old kh20 version (7 days ago) before latest master update. Thank You

karlheyes commented 1 year ago

I've cut a pre-release kh20.2 with windows builds as well.

karl.

gunsar commented 1 year ago

thanks, I'll try it

gunsar commented 1 year ago

@karlheyes when downloading the icecast-2.4.0-kh20.2 windows64 version there is a notification of a dangerous file, why is that?

karlheyes commented 1 year ago

20.3 is up with a bunch of fixes. It has worker changes so can be a more critical change but looks to be solid and improves in certain areas.
no idea on any dangerous file message. The dll files are sent out via distribution and haven't changed since december and the binary is built by me.

karl

gunsar commented 1 year ago

my experiment result: 20.3 for linux, windows, azuracast ansible, running fine but there is a problem for Icecast Directory, but linux/azuracast ansible version after update from master and running normally 20.3 for CentovaCast it crashes and the station offiles after 4-6 hours, don't know what the problem is, even though version 20.1 is normal, but versions 20.2 and 20.3 crash and the station suddenly goes offline. kh17 version is still best for CentovaCast

DroidU commented 1 year ago

Unfortunately, it is the same for us. The kh-20 sometimes crashes and I can't reproduce this. We use MSCP Pro, where the Icecast2 capability is fully utilized. Also among them is OggFLAC support. Unfortunately, because of this, bugs are more likely to appear.

karlheyes commented 1 year ago

can I get a sitrep on things how they stand with 20.5. There are windows binaries uploaded as well.

karl

gunsar commented 1 year ago

I now install and try it 20.5 I will monitor this trial version on linux, windows, azuracast and centova, thanks karl

gunsar commented 1 year ago

it's been 3 days of trial period for linux + windows (icecast-2.4.0-kh20.5) running normally without problems also testing on centova + azuracast ansible (icecast-kh master) runs normally without problems thank you @karlheyes I will continue to be monitored

karlheyes commented 1 year ago

kh21 is up.

karl

gunsar commented 1 year ago

@karlheyes after 2 weeks trial kh21. it turned out that the problem was present again for the relay, the incident repeated again like the kh20 version. after several times the source server relays off and then runs again, the server does not function to relay and icecast off. i experienced the same thing on trials on azuracast and centova. back to kh17 version back to normal.

onur58 commented 1 year ago

@karlheyes Hi Karl, we are also facing similar behavior during fallback on relay master server. It seems that icecast service is running partially (icecast logs are generated but the web services like admin interface and connection over http is not possible -- it stucks and the CPU Load is goes to 100%). Icecast service reload doesn't help and we need to restart the service to bring it up again. image image

Many thanks for your support.

gunsar commented 1 year ago

@onur58 what version are you using? have you tried version 21.2? I experimented on 1VPS, it still runs normally version 21.2 to CPU and RAM

onur58 commented 1 year ago

@gunsar We are using the latest offical release 2.4.0-Kh21.0.

gunsar commented 1 year ago

@onur58 maybe you can just try updating to version 21.2 or 21.4.

onur58 commented 1 year ago

@gunsar the crash behavior is fixed in 21.2 or 21.4.? your last comment was rollback to kh17 :-)

karlheyes commented 1 year ago

I'll be cutting a 21.5 shortly with a reload fix from 21.4. the rest being related to windows test runs.

karl

gunsar commented 1 year ago

@gunsar the crash behavior is fixed in 21.2 or 21.4.? your last comment was rollback to kh17 :-)

@onur58
That's right, when I was experimenting, it was still the kh21.0 version and I returned to the kh17 version because there were lots of problems, after I immediately saw that there was an improvement by @karlheyes in the kh21.2 version, I tried it and it's still running normally until now on icecast linux and centova panels. for azuracast I tried version kh21.3 and it still runs normally. now there is version kh21.5 i will try the widows version

onur58 commented 1 year ago

@karlheyes many thanks for the quick fix. I will deploy 21.5 tonight @gunsar thank you also for your support :-)

onur58 commented 1 year ago

@karlheyes Hi Karl, I deployed the 21.5 last week and during the weekend we faced a network issue on the source therefore we were not able to fallback on the relay master server (both sources where not reachable from relay master). In this scenario the edge icecast servers tries to connect the relay master and during few minutes the edge icecast servers are not reachable (icecast web service socket timeout): If I check the service there was no reload or restart of icecast service and the load and cpu usage was not high.

I saw in the meantime you released kh21.6 what do you mean with expand on the relay switchover failure case handling?

many thanks for your support.

karlheyes commented 1 year ago

The relay switchover is code when you define multiple hosts in a relay eg

<relay local-mount="/stream" on-demand="no">
   <host priority="1" ip="127.0.0.1" port="12000" mount="/internal" />
   <host priority="2" ip="192.168.1.10" port="7000" mount="/stream" />
 </relay>

switchover occurs when the feed switches because the stream terminates or a higher priority stream comes back online. Odd situations can be difficult to resolve, in this case it was if the higher priority hosts was available but suffers from a problem such as a timeout or just does not last long enough, like a short file. While the higher streams are rechecked every few seconds, I made it so that such failure lasted 10 minutes. The next bit is to make that more configurable in the xml. These are obviously separate from any fallback handling but works in a way not too dissimilar in a fallback overall.

Your description of the problem sounds more like a lock imbalance being triggered but not much to go on regarding the trigger event. There was a crash bug bug if auth was used but that does not sound like what you experienced. A lockup case might be identified if you grab core file so backtraces can be acquired (eg gcore )

karl

gunsar commented 1 year ago

The relay switchover is code when you define multiple hosts in a relay eg

<relay local-mount="/stream" on-demand="no">
   <host priority="1" ip="127.0.0.1" port="12000" mount="/internal" />
   <host priority="2" ip="192.168.1.10" port="7000" mount="/stream" />
 </relay>

this is very good, because I also tried to relay from local (in one VPS) to make multi icecast (icecast linux + azuracast docker = icecast linux local relay from azuracast docker). now i will try again with kh21.6, thanks karl

onur58 commented 1 year ago

@karlheyes @gunsar

Thanks guys for the support. Our relay settings looks like: `

    <relay>
            <server>10.1.1.1</server>
            <port>80</port>
            <mount>/radio1/mp3_128</mount>
            <local-mount>/s/radio1/mp3_128</local-mount>
            <retry-delay>10</retry-delay>
            <on-demand>0</on-demand>
    </relay>
    <relay>
            <server>10.1.1.2</server>
            <port>80</port>
            <mount>/radio1/mp3_128</mount>
            <local-mount>/m/radio/mp3_128</local-mount>
            <retry-delay>10</retry-delay>
            <on-demand>0</on-demand>
    </relay>

<mount>
    <charset>ISO8859-1</charset>
    <mount-name>/m/radio1/mp3_128</mount-name>
    <fallback-mount>/s/radio1/mp3_128</fallback-mount>
    <fallback-override>1</fallback-override>
    <max-listener-duration>68400</max-listener-duration>
        </mount>
<!-- endmount -->

`

I am not so familiar with gcore and gdb but I got the following output: image

It seems that 5 threads were happened but no idea why...

What I also identified is that CPU usage with kh21.5 is approx. 20% higher than kh 13

Example with same hardware but different kh versions:

Many thanks for your feedback in advance.

karlheyes commented 1 year ago

the command is gdb /usr/bin/icecast core.xxxx then in gdb thread apply all bt there will be multiple threads, you have at least 2 workers, with a few other threads depending on the exact build and configuration and what the state was at the time, eg auth threads are create/destroyed on the fly. A clearer build would be one with make debug, it helps with the code references as optimization can mix things up. A lockup would show in such cases.

A lot of changes between kh13 and 21.5. unsure on big changes for the cpu on an IO bound app but some aspects can be heavier like xsl, or ssl. Might be good to check. Could be log related, maybe something is just heavy on a busy loop type of thing and needs adjusting.

you could do an strace -tt -o output -ff -p pidof icecast ctrl-C after say 30 seconds, send the resulting output files to me.

karl.

karlheyes commented 1 year ago

Thanks for the strace, I think I have an idea on what is going on there, will look into that

karl.

onur58 commented 1 year ago

Hi Karl, many thanks for your support :-)

karlheyes commented 1 year ago

I've committed a limiter change to the master tree for the client handling to prevent excessive processing by some client handlers which will be the cause of the CPU issue, it may need other changes as I haven't stressed it enough yet but it will be the bulk of it.

I'll be cutting a another pre-release shortly, when I get some feedback on something else, so another update will be out the next day or so. Although you can try the master tree if you want something right now.

karl

onur58 commented 1 year ago

Hi Karl,

It sounds good for me. I will wait your new pre-release. Many thanks for your support.