Closed SpeedyCharly closed 2 years ago
Hi,
Thanks for the report. I will have a look at it as part of the 2.0.2
release cycle.
For the fist time, yesterday I caught an episode during live stream... Here is an edited log of the occurrence; Edited Log.txt
Here is an edited portion of LiquidSoap log showing unusual activity around items from jingle playlists; Raw Response.txt
Could that be related to/trigger of CPU overloads?
Hi @SpeedyCharly. Thanks for your report. Can I used the script here: https://github.com/AzuraCast/AzuraCast/issues/4783#issuecomment-966149114 as reference?
My first suggestion would be to instrument the code around the URI query before log("AzuraCast Raw Response: #{uri}")
. If this call is hanging for any reason, this will cause issues with the main streaming thread and, potentially, cause CPU to spike as liquidsoap is trying to catchup with the delay.
Would you mind adding log lines around the call for instance:
def autodj_next_song() =
log("Starting call to AzuraCast API")
uri = list.hd(process.read.lines(env=[("API_AUTH", !azuracast_api_auth)], 'curl -s --request POST --url http://web/api/internal/1/nextsong --form api_auth="$API_AUTH"'), default="")
log("AzuraCast Raw Response: #{uri}")
Regarding your encoding issue, the character \233
that is coming through such as in "Radio ID - En route vers l'ouest - V\233ro en ondes"
is actually \uFFFD
, which is the UTF8 replacement character, which means that the character/string is getting corrupted before hitting us, for instance being consumed by a previous system that thinks it is UTF8 while it actually is another encoder.
@toots Yes of course, you can reference the Azura issue here; I've raised the issue on both githubs to put the 2 development teams in synch on this. Not Linux savvy here, not sure what you need me to do with your suggested piece of script...
The logs I have provided are generated via the AzuraCast console and the query cannot be edited there... Not sure how to proceed on Putty or other client... Or, is that script to be added to LS config?
@toots [Regarding your encoding issue, the character \233 that is coming through such as in "Radio ID - En route vers l'ouest - V\233ro en ondes" is actually \uFFFD, which is the UTF8 replacement character, which means that the character/string is getting corrupted before hitting us, for instance being consumed by a previous system that thinks it is UTF8 while it actually is another encoder.] Thanks for that info, but with close to 200,000 media files in my collection, with about 1/2 in French... Not about to start editing out all French special characters... LOL.
In mIRC, we get around the problem by disabling UTF8, can same be done in LS?
@toots [Regarding your encoding issue, the character \233 that is coming through such as in "Radio ID - En route vers l'ouest - V\233ro en ondes" is actually \uFFFD, which is the UTF8 replacement character, which means that the character/string is getting corrupted before hitting us, for instance being consumed by a previous system that thinks it is UTF8 while it actually is another encoder.] Thanks for that info, but with close to 200,000 media files in my collection, with about 1/2 in French... Not about to start editing out all French special characters... LOL.
In mIRC, we get around the problem by disabling UTF8, can same be done in LS?
I'm sorry if that wasn't clear. What I'm saying is something in your processing chain is already corrupting the tags before they come to us. You can definitely disable UTF8 or tag recording but you will still have tags with the weird �
in them.
@toots Yes of course, you can reference the Azura issue here; I've raised the issue on both githubs to put the 2 development teams in synch on this. Not Linux savvy here, not sure what you need me to do with your suggested piece of script...
The logs I have provided are generated via the AzuraCast console and the query cannot be edited there... Not sure how to proceed on Putty or other client... Or, is that script to be added to LS config?
What I was trying to say is, do you have access to the liquidsoap script that your AzuraCast instance is running? Can we add more logging there to help debug the issue?
@toots I have access to the LS config file which looks like this at the moment; LS Config.txt
Great so, could you change the autodj_next_song
to look like this:
# AutoDJ Next Song Script
def autodj_next_song() =
log("Starting call to AzuraCast API")
uri = list.hd(process.read.lines(env=[("API_AUTH", !azuracast_api_auth)], 'curl -s --request POST --url http://web/api/internal/1/nextsong --form api_auth="$API_AUTH"'), default="")
log("AzuraCast Raw Response: #{uri}")
And report back the next time you see high CPU usage?
ok a big 10-4 on that LOL
Edit; It seems that part of the config cannot be edited by lowly users... Queried the grand poobah of AzuraCast on this... Will keep you informed :P
@SpeedyCharly I've updated the Rolling Release to add the relevant logging information requested by @toots above; if you don't mind updating your installation, that can help us with the debugging process.
@SpeedyCharly I've updated the Rolling Release to add the relevant logging information requested by @toots above; if you don't mind updating your installation, that can help us with the debugging process.
Update done and put all jingle playlists full on with Smart crossover also on.... will report on how it goes ....
Woke up to a totally dead stream this morning. Restart wouldn't even bring it back online. Killed all the JINGLE playlists save 1 then went to OVH console to reboot the server from there. Stream came back on just in time for the start of live programming.
Here is a log showing the last part of that story;
Editted log.txt
Hi. Sorry to hear. I think we'd need the part of the logs before it went dead, that would give us some hints about what caused it.
What I see about your restart is that the telnet port was still used. Most likely, a process was still hanging and using it.
[What I see about your restart is that the telnet port was still used. Most likely, a process was still hanging and using it.] This is what I assumed also, which is why I did a hard reboot...
The log didn't contain anything save what you see prior to the restart. Guess it was pushed out the top by all these error lines...
All it showed was +16,000 lines of this; .... Called from file "src/dtools_impl.ml", line 458, characters 10-35 Re-raised at file "src/dtools_impl.ml", line 466, characters 8-15 Called from file "list.ml", line 110, characters 12-15 Called from file "src/dtools_impl.ml", line 458, characters 10-35 Re-raised at file "src/dtools_impl.ml", line 466, characters 8-15 Called from file "main.ml" (inlined), line 566, characters 15-51 Called from file "runner.ml", line 25, characters 9-22 Fatal error: exception Error while trying to bind server/telnet socket: Address already in use in bind() Raised at file "tools/server.ml", line 463, characters 16-59 Called from file "tools/server.ml", line 477, characters 19-34 Called from file "tools/lifecycle.ml", line 33, characters 8-14 .....
@SpeedyCharly If the logs got rotated due to the huge amount of errors you can still get the previous versions of the log file.
The rotated log files will be named liquidsoap.log.1
, liquidsoap.log.2
, liquidsoap.log.3
etc...
The higher the number the older the log file which goes up to 10.
Here is the command to copy the log files from the stations container to the server:
docker cp azuracast_stations:/var/azuracast/stations/yourstationname/config/liquidsoap.log.1 /var/azuracast/liquidsoap.log.1
You will need to replace yourstationname
in that command with the path from your station which can be found in the station profile in the administration tab. (Added an image to better show what part I mean)
This will copy the respective log file (example liquidsoap.log.1
) to the /var/azuracast
directory of the server. Depending on when it crashed it might be that you'll need to get more than one old log file.
LOL... having a lovely Holiday season thus far... You?
Permission denied :
Oh, if you are not using the user root
you'll need to add sudo
to the command like this:
sudo docker cp azuracast_stations:/var/azuracast/stations/yourstationname/config/liquidsoap.log.1 /var/azuracast/liquidsoap.log.1
Or switch to the root user via sudo su
and then run the command.
This because my Azuracast instance isn't a standard install which I inherited from someone else, and I have no idea where things are .... took me forever to find the docker location to be able to do updates...
root@vps-cff8157d:/home/ubuntu# docker cp azuracast_stations:/var/azuracast/stations/yourstationname/config/liquidsoap.log.1 /var/azuracast/liquidsoap.log.1
You need to change yourstationname
to radio_des_festivals
in the command like in your screenshot:
docker cp azuracast_stations:/var/azuracast/stations/radio_des_festivals/config/liquidsoap.log.1 /var/azuracast/liquidsoap.log.1
Same difference....
And, there is no response when I input var/azuracast/ which is an empty folder in my install
EDIT.... figured it out... scanning logs now
Here is the entire log for the night of 18th to 19th; hopefully you will find a clue in there somewhere Night of 18th to 19th.txt
I think we'd need the part of the logs before it went dead, that would give us some hints about what caused it.
@toots this is most likely the part of the logs that you are looking for:
2021/12/19 03:30:55 [next_song:3] Prepared "/var/azuracast/stations/radio_des_festivals/media/Vedettes/Vicky Chagnon Cowboys sweetheart.mp3" (RID 2).
2021/12/19 03:30:55 [cue_next_song:3] Cueing in...
2021/12/19 03:30:55 [lang:3] autodj_next_song: Sending AzuraCast API Call...
2021/12/19 03:30:55 [crossfade_0:3] Analysis: -17.829847dB / -infdB (1.36s / 1.36s)
2021/12/19 03:30:55 [crossfade_0:3] Simple transition: crossed, fade-in, fade-out.
2021/12/19 03:30:55 [lang:3] AzuraCast Feedback Response: OK
2021/12/19 03:30:55 [lang:3] autodj_next_song: AzuraCast API Response: annotate:title="Une poupée pour Noël",artist="Joane Bluteau",duration="136.00",song_id="aad27972f0ae3ce729a9f504c2e3560c",media_id="2843",liq_amplify="0.00dB",playlist_id="12":/var/azuracast/stations/radio_des_festivals/media/Noel Country Fr/11 Une poupee pour Noel.mp3
[mp3 @ 0x7efca0045300] Estimating duration from bitrate, this may be inaccurate
[mp3 @ 0x7efca0048200] Estimating duration from bitrate, this may be inaccurate
2021/12/19 03:30:55 [decoder.id3v2:2] Error while decoding file tags: (Invalid_argument "String.sub / Bytes.sub")
2021/12/19 03:33:26 [decoder:2] Decoding "/var/azuracast/stations/radio_des_festivals/media/Vedettes/Vicky Chagnon Cowboys sweetheart.mp3" ended: Ffmpeg_decoder.End_of_file.
[mp3 @ 0x7efc9c4298c0] Estimating duration from bitrate, this may be inaccurate
[mp3 @ 0x7efc9c4298c0] Estimating duration from bitrate, this may be inaccurate
2021/12/19 03:33:27 [next_song:3] Prepared "/var/azuracast/stations/radio_des_festivals/media/Noel Country Fr/11 Une poupee pour Noel.mp3" (RID 1).
2021/12/19 03:33:27 [cue_next_song:3] Cueing in...
2021/12/19 03:33:27 [lang:3] autodj_next_song: Sending AzuraCast API Call...
2021/12/19 03:33:27 [crossfade_0:3] Analysis: -infdB / -21.413634dB (1.48s / 1.48s)
2021/12/19 03:33:27 [crossfade_0:3] Simple transition: crossed, fade-in, fade-out.
2021/12/19 03:33:27 [lang:3] AzuraCast Feedback Response: OK
2021/12/19 03:33:27 [lang:3] autodj_next_song: AzuraCast API Response: annotate:title="L'hiver a chasser l Hirondelle",artist="Georges Hamel",duration="198.00",song_id="d3d7d344481807b9c16923b21b804193",media_id="2805",liq_amplify="0.00dB",playlist_id="12":/var/azuracast/stations/radio_des_festivals/media/Noel Country Fr/08 Lhiver a chasser l Hirondelle.mp3
[mp3 @ 0x7efca0048200] Estimating duration from bitrate, this may be inaccurate
[mp3 @ 0x7efca0048200] Estimating duration from bitrate, this may be inaccurate
Fatal error: exception Error while trying to bind server/telnet socket: Address already in use in bind()
Raised at file "tools/server.ml", line 463, characters 16-59
Called from file "tools/server.ml", line 477, characters 19-34
Called from file "tools/lifecycle.ml", line 33, characters 8-14
Called from file "tools/lifecycle.ml", line 33, characters 8-14
Called from file "tools/lifecycle.ml", line 33, characters 8-14
Called from file "tools/lifecycle.ml", line 33, characters 8-14
Called from file "tools/lifecycle.ml", line 33, characters 8-14
Called from file "src/dtools_impl.ml", line 455, characters 10-16
Re-raised at file "src/dtools_impl.ml", line 466, characters 8-15
Called from file "list.ml", line 110, characters 12-15
Called from file "src/dtools_impl.ml", line 458, characters 10-35
Re-raised at file "src/dtools_impl.ml", line 466, characters 8-15
Called from file "list.ml", line 110, characters 12-15
Called from file "src/dtools_impl.ml", line 458, characters 10-35
Re-raised at file "src/dtools_impl.ml", line 466, characters 8-15
Called from file "main.ml" (inlined), line 566, characters 15-51
Called from file "runner.ml", line 25, characters 9-22
The only direct error I can see in there seems to be this line (not sure how/if it's related to the telnet server though):
[decoder.id3v2:2] Error while decoding file tags: (Invalid_argument "String.sub / Bytes.sub")
Yes, I don't see anything obvious. What seemed to have happened here is a crash of liquidsoap, followed by an automatic restart that kept failing b/c the telnet port was still in use.
I just looked back at the code and all the sockets involved in the telnet stack are create with the close_on_exec
flag. We used to have issues with external processed sometimes inheriting opened sockets, thus leaving them in use after liquidsoap crashed but this shouldn't be the case.
I'm afraid to ask but we would need to elevate the log levels in this case:
log.level.set(4)
At the beginning of the script.
Also, if the crash is a deadlock, we'd have to get ourselves familiar with gdb
but let's get to it when we really need it and start with increasing the log level.
[I'm afraid to ask but we would need to elevate the log levels in this case: log.level.set(4)]
That one was easy... done will report later - Can only report at night as stream is live during daytime hours...
No issues seen last night as I had disabled all Jingle playlists
Thks!
I'm currently encountering this issue right now, despite trying to forcefully restart the station via CLI, the stations just unable to respond and continue to spin out the same errors are Speedy.
As soon as one station runs into the issue, all of my others run into the same issue and a chain crash occurs when I'm unable to recover from it. The only recovery option is to restart the entire process (azuracast) which isn't ideal.
Something from Glances:
2021-12-20 22:01:16 (ongoing) - LOAD (Min:1.0 Mean:1.4 Max:1.7)
2021-12-20 22:00:12 (ongoing) - CPU_USER (Min:76.8 Mean:88.7 Max:95.8): liquidsoap, liquidsoap, liquidsoap
2021-12-20 21:59:40 (0:00:23) - CRITICAL on CPU_USER (92.4): supervisord, liquidsoap, liquidsoap
@toots Not sure what is going on here but it seems my limited access to the server is blocking me. Issuing the command [sudo docker cp azuracast_stations:/var/azuracast/stations/radio_des_festivals/config/liquidsoap.log.1 /var/azuracast/liquidsoap.log.1] via putty but the logs already present in /var/azuracast/ don't get overwritten and accessing the server with Filezilla is no help as the delete command is denied - seems I only have reading rights...
The guy I inherited this server from had assured me I had full sudo/sftp access, but it seem not. Not sure what to do from here.... sorry !
@toots these logs may be useful for you as Log level 4. Caught this almost immediately https://gist.github.com/SC2Mitch/c2e796ed9a63733ad7ce6df3d6922d52
@SC2Mitch Thanks... looks very similar to what I'm seeing on my install...
Thanks @SC2Mitch that's very useful. Are you able to share more of what happens before?
Yep, I'm unsure on what exactly your looking for but here's some extended logs https://gist.githubusercontent.com/SC2Mitch/c2e796ed9a63733ad7ce6df3d6922d52/raw/6e3d400c19a52f3d5e1923e79b0bbf1d88e17516/Extended%2520logs
and here's the entire raw log file (9000+ lines) https://gist.githubusercontent.com/SC2Mitch/c2e796ed9a63733ad7ce6df3d6922d52/raw/6e3d400c19a52f3d5e1923e79b0bbf1d88e17516/Entire%2520log%2520file
Thanks! I'm suspecting that something is blocking either the main streaming thread or the background processing queues.
I see this in the logs that looks suspicious:
2021/12/21 16:24:45 [lang:3] dj_auth: Sending AzuraCast API DJ Auth command for user: source
2021/12/21 16:24:48 [harbor:4] New client on port 8196: 167.248.133.44
2021/12/21 16:24:49 [lang:3] dj_auth: Sending AzuraCast API DJ Auth command for user: source
2021/12/21 16:24:49 [lang:3] dj_auth: AzuraCast API Response: false
2021/12/21 16:24:49 [harbor:4] ICY error: invalid password
2021/12/21 16:24:52 [harbor:4] New client on port 8196: 167.248.133.44
2021/12/21 16:24:52 [harbor:4] Connection reset by peer in read()
2021/12/21 16:24:52 [harbor:4] New client on port 8196: 162.142.125.58
2021/12/21 16:24:53 [harbor:4] Connection reset by peer in read()
2021/12/21 16:24:53 [harbor:4] New client on port 8196: 162.142.125.58
2021/12/21 16:24:53 [lang:3] dj_auth: Sending AzuraCast API DJ Auth command for user: îêüþkdtúå$tç¯GCÓ£HìxJÇNõ;A0 ·Q«S&^uªiú*wA¾ÃVz8Ív²&̨̩À/À0À+À
What's the code around this log: Sending AzuraCast API DJ Auth command for user:
?
Snip, Silver replied with the code.
dj_auth: Sending AzuraCast API DJ Auth command for user: ���î�ê��üþkdtúå$tç¯GCÓ£���H�ì�xJÇNõ��;A0 ·Q��«�S&^uªi�ú*�wA��¾��ÃVz8Ív�²&̨̩À/À0À+À
Well, that's exciting...
Here's the relevant code for DJ auth:
def dj_auth(login) =
user = ref("")
password = ref("")
if (login.user == "source" or login.user == "") and (string.match(pattern="(:|,)+", login.password)) then
auth_string = string.split(separator="(:|,)", login.password)
user := list.nth(default="", auth_string, 0)
password := list.nth(default="", auth_string, 2)
else
user := login.user
password := login.password
end
log("dj_auth: Sending AzuraCast API DJ Auth command for user: #{!user}")
ret = list.hd(process.read.lines(env=[("DJ_USER", !user), ("DJ_PASSWORD", !password), ("API_AUTH", !azuracast_api_auth)], 'curl -s --request POST --url http://web/api/internal/3/auth --form dj-user="$DJ_USER" --form dj-password="$DJ_PASSWORD" --form api_auth="$API_AUTH"'), default="")
log("dj_auth: AzuraCast API Response: #{ret}")
authed = bool_of_string(ret)
if (authed) then
last_authenticated_dj := !user
end
authed
end
Finally was able to retrieve the logs from yesterday's (21) crash ; 21-12 Crash.txt Here is the log of the hard reboot @ around 8:30 same morning; 21-12 Reboot.txt
Same story repeated itself overnight (21>22) Seems to always crash about 4 hrs after end of live broadcast... Something must be incrementally eating at CPU resources, forcing the crash after 4 hrs of AutoDJ operation...
@SpeedyCharly We did do some updates yesterday afternoon to establish a timeout for some of the API calls; make sure you're on at least version 82a8d1c.
[We did do some updates yesterday afternoon to establish a timeout for some of the API calls; make sure you're on at least version 82a8d1c.]
Will do at end of live streaming tonite - thanks SlvrEagle
Just a little while ago had a little episode during live streaming - not sure if related to the current issue - looks like potential hacker activity to me.... Here is the relevant log snippet; Weird episode.txt Seems to me SC2Mitch had something similar yesterday...
Was unable to proceed with update as the process crashed on fatal error...
Please advise.... thanks
@SpeedyCharly This has been fixed as of the latest update.
@SlvrEagle23 Dankesheun mein Herr; update done
@SlvrEagle23 @toots Happy to report that my station ran fine all of last night and wasn't down this morning, for the first time in a while. Thanks to the both of you for seeing this through!
I believe the commit silver applied has resolved the issue. Had no downtime for the past 2 days and system resources are very healthy.
Glad to hear. Marking this as fixed.
Describe the bug I've been operating a radio station from a server running Azuracast for about 2 years now without any issues. In the last few weeks, Liquidsoap has been causing the CPU to redline, ultimately causing buffer overruns and killing the media stream altogether. I run 1 main playlist with 2 more on a 'play per x' basis. In addition, I have a few 'Jingles' playlists (90 secs. clips or less) which I call upon on a 'play per x' basis also to run Radio IDs, dj ads and the like. It seems that Liquidsoap redlines when handling selections from Jingles playlists and the buffer overruns end up killing the stream after about 1 hour of operation. The only way I can get the radio station to run without interruptions is to disable all Jingles which returns the CPU to normal loads. Not such CPU overloads occur when streaming live. I have raised the issue with Azuracast, but it seem that the issue lies with Liquidsoap itself, which is why I am now raising it here. You can view what transpired at Azuracast here ; [ https://github.com/AzuraCast/AzuraCast/issues/4783] which will give you acccess to all the info and the Liquidsoap logs ! have generated while trying to identify the source of the problem.
To Reproduce Pls see Azuracast issue thread above
Version details
Install method Docker (Azuracast)