Closed niko closed 10 years ago
This is the error log on the master at the same time, just grepping for "listener count on /STATIONNAME":
[2013-12-12 14:18:57] INFO source/source_read listener count on /STATIONNAME now 1
[2013-12-12 14:20:00] INFO source/source_read listener count on /STATIONNAME now 2
[2013-12-12 14:24:58] INFO source/source_read listener count on /STATIONNAME now 1
[2013-12-12 14:26:15] INFO source/source_read listener count on /STATIONNAME now 2
[2013-12-12 14:34:58] INFO source/source_read listener count on /STATIONNAME now 1
[2013-12-12 14:35:59] INFO source/source_read listener count on /STATIONNAME now 2
[2013-12-12 14:42:57] INFO source/source_read listener count on /STATIONNAME now 1
[2013-12-12 14:44:04] INFO source/source_read listener count on /STATIONNAME now 2
It seems obvious that the source has always been available.
This is the log on which the graph is bases upon:
1386854221,7554,-,X
1386854281,7537,-,X
1386854341,4533,-,X
1386854401,5068,-,X
1386854461,5526,-,X
1386854521,5666,-,X
1386854581,5770,-,X
1386854641,5857,-,X
1386854701,0,-,X
1386854761,88,-,X
1386854821,1953,-,X
1386854881,2427,-,X
1386854942,2703,-,X
1386855001,2930,-,X
1386855061,3225,-,X
1386855121,3371,-,X
1386855181,3453,-,X
1386855241,3627,-,X
1386855301,628,-,X
1386855361,-,-,X
1386855421,2191,-,X
1386855481,2627,-,X
1386855541,2789,-,X
1386855601,3052,-,X
1386855661,3253,-,X
1386855721,3461,-,X
1386855781,0,-,X
1386855841,31,-,X
1386855901,1252,0,X
1386855961,1442,451,X
1386856021,1528,567,X
1386856081,1655,772,X
1386856141,1756,933,X
1386856201,1849,1007,X
1386856261,1898,1073,X
1386856321,1950,1114,X
The drop at 14:36 didn't affect all listeners as the graph suggests, but curl didn't succeed downloading the listener count. '-' means curl failure, 'X' means missing xsl file in Icecast. Our second server wasn't running at that time. The third just doesn't have the listeners xsl file installed. But interestingly the main server didn't respond in a manner curl likes. This is the - admittedly crude - logging script:
#!/bin/bash
TIMESTAMP=$(date '+%s')
L1=$(curl http://stream1/listeners.total.xsl || echo '-')
if [[ $L1 = *XSLT* ]]; then L1='X'; fi
L2=$(curl http://stream2/listeners.total.xsl || echo '-')
if [[ $L2 = *XSLT* ]]; then L2='X'; fi
L3=$(curl http://stream3/listeners.total.xsl || echo '-')
if [[ $L3 = *XSLT* ]]; then L3='X'; fi
echo "$TIMESTAMP,$L1,$L2,$L3"
Niko.
you could try a more recent build as well, there has been a relay update since then, if you do then get the latest as it has a fix for applying client details on restarting relays.
karl.
Will try the new version tomorrow. Niko.
After a quiet day the server with the new Icecast version just lost all its stations. In the INFO-logs it looks like this:
[2013-12-15 01:33:35] INFO slave/relay_reset servers to be retried on /maximum
[2013-12-15 01:33:35] INFO slave/relay_read fallback on /maximum attempted
[2013-12-15 01:33:35] INFO source/source_shutdown Source "/maximum" exiting
[2013-12-15 01:33:35] INFO source/source_set_fallback No fallback on /maximum
[2013-12-15 01:33:35] INFO slave/relay_read shutting down relay /maximum
[2013-12-15 01:33:35] INFO source/source_free_source source /maximum to be freed
[2013-12-15 01:33:35] INFO source/_free_source freeing source "/maximum"
[2013-12-15 01:34:12] INFO slave/start_relay_stream Starting relayed source at mountpoint "/maximum"
[2013-12-15 01:34:12] INFO slave/open_relay_connection using username relay for /maximum
[2013-12-15 01:34:12] INFO slave/open_relay_connection connecting to 10.0.0.100:8080 for /maximum
[2013-12-15 01:34:12] INFO format-mp3/format_mp3_get_plugin Created format details for /maximum
[2013-12-15 01:34:12] INFO format-mp3/mpeg_apply_client icy metadata format expected on /maximum, interval 16000
[2013-12-15 01:34:12] INFO source/source_init Source /maximum initialised
[2013-12-15 01:34:12] INFO source/source_apply_mount Applying mount information for "/maximum" from "/*"
[2013-12-15 01:34:12] INFO slave/start_relay_stream source /maximum is ready to start
[2013-12-15 01:34:12] INFO source/source_read listener count on /maximum now 0
[2013-12-15 01:34:12] INFO source/source_set_override no override for /maximum set
[2013-12-15 01:34:12] INFO format-mp3/parse_icy_metadata incoming title for /maximum Celsius911 - Sake (Promo) 2011
Two other relays (both running the "old" Icecast version) did not loose any stations. And again dropping all station reoccurred twice. I then restartet Icecast,
@karlheyes: core.29497 is at the usual location.
Niko.
I still need the icecast binary and the log leading up to the case, whether level 3 is enough remains to be seen.
karl.
Sry, forgot about the binary. The binary is at /icecast-relay and the log at /icecast-error.log.gz.
Niko.
The slave error log is not reporting anything out of the ordinary, obviously level 3 is not that high in that regard. If level 4 is not possible then we would need to isolate the issue further, may be look at the access log on the master for the /admin/streams or /admin/streamlist entries to see what much was returned in comparison to what was there before, may be run a duplicate icecast with no listeners to see if the effect is duplicated. I assume the core is from gcore from some point after the fact, in which case it's of little help. I still suspect the master is in a mid update cycle
karl.
After more investigation, this looks to be because the master feeds were showing certain mountpoints missing, and the slave was reacting correctly to that. The actual problem stems from stats disappearing from the master in a drop/recreate procedure when streams started or stopped. The actual entries that went were those that are listed when there no stream but there is a fallback present.
The recent patch uses a timestamp that does not apply for normal sources but is set to the current time for these fallback-only mounts. These are updated on a subsequent rechecks if found to be still required and only after these rechecks are expired stats purged. This should prevent the drop/recreate which is getting detected by the slave.
If you can confirm this then we'll close this report.
karl.
We're running the latest version now for 2 days. I'm carefully optimistic. I for myself will consider this issue closed it the slaves don't "loose" (for whatever reason) arbitrary streams for another 5 days.
Thanks for your efforts in this nasty edge case. I suspect that a large number of "virtual" mountpoints without sources but only fallbacks isn't the every day usecase. So kudos for catching this one.
All the best, Niko.
shall I assume this is no longer showing as a problem now?
karl.
Yes, we didn't have any occurences of this since the last update. Thanks! :)
This is basically reopening #67, but with updated information after having fixed a broken configuration.
Icecast version is 24-g3cb456b.
Usually at high listener counts (+7000) the relay server logs "relay shutdown requests" although the list of stations to relay as returned by the master looks all right:
The first of the bunch of "shutdown request" log entries was at 14:18:55.
At the same time we're logging listener counts on the relay and the station counts on the master server (S:[stationcount] L:[listenercount] DS:[stationcount download time] DL:[listener count download time]):
So although the master server seems to have returned a complete list of relay stations the relay is dropping the stations.
Also once the relay is in this "state", stations are dropped again and again until it is restarted.
In this images at 14:18 you can see the first drop of stations at 14:18, then another two drop until at 14:44.
I made 3 codedumps at (one at 14:39 and two at 14:44), the first while the server seemed to be recovering, the last two while all stations where gone. @karlheyes : I'll drop you an email with the download location.
Niko.