karlheyes / icecast-kh

KH branch of icecast
GNU General Public License v2.0
298 stars 106 forks source link

listener_remove reliability #6

Closed DuncanFairley closed 11 years ago

DuncanFairley commented 12 years ago

My icecast setup is pretty vanilla - the only real difference from the norm is that I have listener_add and listener_remove calling a PHP script which adds and removes a row to a MySQL database for that listener's session.

The script is really basic, and I'm very confident it's not to blame here. The MySQL database is also local, using a stable build, etc., etc. Quite rarely, listener_remove somehow won't call the PHP script when a listener drops. I haven't been able to replicate the situation where this occurs. It might happen once every couple weeks, but it's enough to throw off my stats.

I rarely restart Icecast itself, but I do kill -HUP every couple weeks for config changes.

Any pointers on where I should be looking to recreate this issue so it can get fixed?

karlheyes commented 12 years ago

Difficult to come up with a simple way to spot any issue. Obviously updating to the latest on github is a good start as I'm in a slow-down to release, but I'm not aware of any listener_remove issue in recent times. There has been cases of a stuck listener which is where a lack of writing did not trigger a termination and therefore the auth trigger.

You could in the listener_add script query the global listeners count (a custom xsl could make it simple) from icecast and compare that number with the row count in your table.

karl.

Allmoz commented 12 years ago

since i do something similar, i will try to confirm if this happens to me too in the next weeks

Allmoz commented 12 years ago

hu i was sure i had reply to this one again....well... sinece the last post i got 9 missing listener remove from almost 50.000 listener_add ( 53k clientId )

karlheyes commented 12 years ago

On 29/03/12 05:38, Allmoz wrote:

hu i was sure i had reply to this one again....well... sinece the last post i got 9 missing listener remove from almost 50.000 listener_add ( 53k clientId )

Can you tells me what your configuration is like and what requests are being made by listeners. I'm wondering if it is because of alias or on-demand relay failures.

karl.

Allmoz commented 12 years ago

hum for some time was whit / so it got all the trafic. im not using relays nor alias. since it seams that doing / and after adding /stream dont work to extend a single mountpoint configuration y had to make an include to a folder whit each mount configuration. since then, there were 3 more lost listener_remove, the last two had the same listener_add timestamp.

icecast.xml

``` 500 6 524288 120 60 40 65535 hackme admin hackme radio.cemzoo.com 15 http://dir.xiph.org/cgi-bin/yp-cgi 8000 207.150.162.160 /stream 1 /usr/local/share/icecast /var/log/icecast /usr/local/share/icecast/web /usr/local/share/icecast/admin /var/run/icecast/icecast.pid access2.log error2.log playlist.log 3 0 icecast icecast /opt/radio/xml/mounts/8000*.xml ```

8000stream.xml

<mount>
    <mount-name>/stream</mount-name>
    <stream-description>CemZoo Radiostation</stream-description>
    <stream-url>http://radio.cemzoo.com:8000</stream-url>
    <fallback-mount>/fallback</fallback-mount>
    <fallback-override>1</fallback-override>
    <authentication type="url">
         <option name="stream_auth"    value="http://radio.xxxxxx.com/bin/ice_auth.php"/>
         <option name="mount_add"    value="http://radio.xxxxxx.com/bin/ice_auth.php"/>
         <option name="mount_remove" value="http://radio.xxxxxx.com/bin/ice_auth.php"/>
         <option name="listener_add"    value="http://radio.xxxxxx.com/bin/ice_auth.php"/>
         <option name="listener_remove" value="http://radio.xxxxxxxx.com/bin/ice_auth.php"/>
    </authentication>
    <on-connect>/usr/local/bin/icecast_connect</on-connect>
    <on-disconnect>/usr/local/bin/icecast_disconnect</on-disconnect>    
    <dump-file>/home/radio/dump/cz.radio.ogg</dump-file> 
</mount>

and the listener_end SQL

    $db = Registry::getPdoDb();
    $st = $db->prepare("
        UPDATE logListener SET
            end = :end,
            duration = :duration
        WHERE
            server = :server AND
            port = :port AND
            clientId = :clientId AND
            ip = :ip AND
            ISNULL(end)
        ");

    $st->bindValue(':end', date("Y-m-d H:i:s",time()));
    $st->bindValue(':duration', $icececast->duration);
    $st->bindValue(':server', $icececast->server);
    $st->bindValue(':port', $icececast->port);
    $st->bindValue(':clientId', $icececast->clientId);
    $st->bindValue(':ip', $icececast->ip);
    $st->execute();

one option is that in rare cases the end ip is different from the start one, here is the entries

tell me what more should i seek for ^^

karlheyes commented 12 years ago

On 29/03/12 12:28, Allmoz wrote:

hum for some time was whit/ so it got all the trafic. im not using relays nor alias. since it seams that doing/ and after adding/stream dont work to extend a single mountpoint configuration y had to make an include to a folder whit each mount configuration. since then, there were 3 more lost listener_remove, the last one whit had the same listener_add timestamp.

If you use the /* matching then you can exclude certain requests by placing them before the /* This may be ideal for requests like /.xsl or /images/ etc where you do not want url auth to be used. eg

8000stream.xml

/stream CemZoo Radiostation http://radio.cemzoo.com:8000 /fallback 1 /usr/local/bin/icecast_connect /usr/local/bin/icecast_disconnect /home/radio/dump/cz.radio.ogg

What is fallback (file or stream) and are these listeners getting to fallback when they disconnect? The code as it is currently, if listener requests /stream and the above mount matches then listener_add is called as you expect, if they disconnect or fallback and disconnect then /stream is picked out again and should match this mountpoint again for the listener_remove

I'm just wondering if there were any rejections after auth that caused listeners to drop before getting to the stream. Look for GET /stream attempts from access log that had a short duration or a 4xx response instead of the usual 200

karl.

Allmoz commented 12 years ago

nice, anyway i need to add

    <on-connect>/usr/local/bin/icecast_connect</on-connect>
    <on-disconnect>/usr/local/bin/icecast_disconnect</on-disconnect>
    <dump-file>/home/radio/dump/cz.radio.ogg</dump-file>

to certain mount point, so i need to configure them anyway im think to make php create the mount xml files to the folder, and send the configuration update request

the /fallback is just a mount point

<mount>
    <mount-name>/fallback</mount-name>
    <hidden>1</hidden>
    <stream-description>CemZoo Radiostation</stream-description>
    <stream-url>http://radio.cemzoo.com:8000</stream-url>
    <authentication type="url">
         <option name="stream_auth"    value="http://radio.xxxxxx.com/bin/ice_auth.php"/>
         <option name="mount_add"    value="http://radio.xxxxxxx.com/bin/ice_auth.php"/>
         <option name="mount_remove" value="http://radio.xxxxxx.com/bin/ice_auth.php"/>
         <option name="listener_add"    value="http://radio.xxxxx.com/bin/ice_auth.php"/>
         <option name="listener_remove" value="http://radio.xxxxxx.com/bin/ice_auth.php"/>
    </authentication>
</mount>

fallback is almost not used, since most of ppl dont know that exist, or how to use it

that are the only 2 mount points configurated... also the fallback configuration is only working after y separated the mount to the include path, cose that info whas after the /* match.

whit 4xx there are hundreds.. cose ppl leave winamp trying to conect... every 1 second after the stream ends....

Allmoz commented 12 years ago

if ti help for the missing listener_remove on ip 201.215.228.235 there is no access error in the log whit that ip i use header('icecast-auth-user: 1') to authenticate

karlheyes commented 12 years ago

I do not have access to your logs, what messages appeared at the time that IP connected, did you see the listener count increasing?

karl.

karlheyes commented 12 years ago

Can you check against kh32. I did fix a case where a listener_remove may not of been done

karl.

Allmoz commented 12 years ago

yups, im testing that since moonday, already got 3, listener add, whiout their listener remove

2012-04-17 21:35:52 14 radio.xxx.yyy 8000 /stream 64.76.13.x Argentina 2012-04-17 21:36:27 - radio.xxx.yyy 8000 /stream 64.76.13.x Argentina 2012-04-17 21:37:42 387 radio.xxx.yyy 8000 /stream 64.76.13.x Argentina

2012-04-18 00:04:36 3 radio.xxx.yyy 8000 /stream 189.183.172.x Mexico 2012-04-18 00:04:36 103 radio.xxx.yyy 8000 /stream 189.183.172.x Mexico 2012-04-18 00:04:39 109 radio.xxx.yyy 8000 /stream 189.183.172.x Mexico 2012-04-18 00:06:23 53 radio.xxx.yyy 8000 /stream 189.183.172.x Mexico 2012-04-18 00:06:58 7 radio.xxx.yyy 8000 /stream 189.183.172.x Mexico 2012-04-18 00:07:05 17 radio.xxx.yyy 8000 /stream 189.183.172.x Mexico 2012-04-18 00:07:23 35 radio.xxx.yyy 8000 /stream 189.183.172.x Mexico 2012-04-18 00:07:26 34 radio.xxx.yyy 8000 /stream 189.183.172.x Mexico 2012-04-18 00:08:01 22 radio.xxx.yyy 8000 /stream 189.183.172.x Mexico 2012-04-18 00:08:10 140 radio.xxx.yyy 8000 /stream 189.183.172.x Mexico 2012-04-18 00:08:26 131 radio.xxx.yyy 8000 /stream 189.183.172.x Mexico 2012-04-18 00:10:36 177 radio.xxx.yyy 8000 /stream 189.183.172.x Mexico 2012-04-18 00:10:45 - radio.xxx.yyy 8000 /stream 189.183.172.x Mexico 2012-04-18 00:14:01 447 radio.xxx.yyy 8000 /stream 189.183.172.x Mexico 2012-04-18 00:14:02 447 radio.xxx.yyy 8000 /stream 189.183.172.x Mexico

2012-04-18 02:28:26 - radio.xxx.yyy 8000 /stream 201.102.92.x Mexico 2012-04-18 02:30:47 894 radio.xxx.yyy 8000 /stream 201.102.92.x Mexico

the ones whitout duration are the issues...

now in kh32

Allmoz commented 12 years ago

well... when a conection is refused for a add_listener for example for a bad password in aprivate mount, there will be no listener_remove <-- could it be that is something related to this ?

karlheyes commented 12 years ago

On 18/04/12 16:46, Allmoz wrote:

well... when a conection is refused for a add_listener for example for a bad password in aprivate mount, there will be no listener_remove<-- could it be that is something related to this ?

If the listener_add rejects the listener then listener_remove will not be called but other than that it should be called even if they are rejected for other reasons. You may be finding a way that is not handled and that is what needs to be identified.

karl

Allmoz commented 12 years ago

ok... is fine that if it is rejected then there is no listener_remove, i havent fully implemented private mount point, so im allowing all, so is not the case in here, but maybe the server is mistaking some listener_add reponse as rejected and it dont send the listener_remove (how should i look for that ?)

Allmoz commented 12 years ago

¿there is a time limit for the response for listener_add ? if there is, it could be that icecast denies the listener by time out on the authentication and the script logs it as a valid, since he don't know that it did timeout sending the authentication response.

Allmoz commented 12 years ago

1 week and no listener_remove missing :D

karlheyes commented 12 years ago

shall we close this for now if the issue looks to be resolved.

karl.

Allmoz commented 12 years ago

ok, but i will keep an eye on it, since there is not too much trafic on my server ( max 15 listener lately )

DuncanFairley commented 12 years ago

I intend to update tonight and flush the database. I'll let you know in a couple days how it goes.

DuncanFairley commented 12 years ago

I've been running kh32 for about a week now. So far there's 6 stray listeners that didn't call listener_remove.

Allmoz commented 12 years ago

just to keep track i will start listing the circunstances of the my lost listeners_remove:

1 - d59776c758 - type=.flv - inestable source & 10 minutes before mount_remove (4 listeners_remove effective)

Allmoz commented 12 years ago

i did just realize that.... i get extra listener remove.... from the source....

action=listener_remove&server=radio.xxxxx.com&port=8000&client=3473&mount=/stream&user=source&pass=xxxxxxxxxxx&ip=201.157.xxx.xxx&duration=0

afther some

action=stream_auth&mount=/stream&ip=201.157.xxx.xxx&server=radio.xxxxxx.com&port=8000&user=source&pass=xxxxxxi&admin=1

but not always

karlheyes commented 12 years ago

The duration of 0 may be of interest here, what can you tell me about client 3473 (or any others in a similar position) what reason led them to disconnect immediately?

The second URL is an admin request wanting verifying, probably /admin/metadata

karl.

Allmoz commented 12 years ago

well... je is the source.... and the id it should be the one of that action=stream_auth cose there is no listener_add whit that id, and it kinds of... fit i will try to contact that source to see if she is using edcast / shoutcast or other source

Allmoz commented 12 years ago

it has happend 14 times, of.... 46,017 calls to the auth i will clean al log again.... i will try to log it a litle ... cleaner

Allmoz commented 12 years ago

24 listener_remove whit source logging data on a laggy (20 reconnections, over 3g internet)

the client id "12740" seems to be in order whit normal listener_add client id..

the last id before that is added 40 second before "12736" then a admin=1 source_auth, after that "12736" is removed, then a normal source_auth ( same source ). and then the "12740" listener_remove whit the user pass and ip of the source

happens especially on laggy source connections

i will keep an eye if it happens too whit 619ae4a7f6

Allmoz commented 12 years ago

ok... the extra listener_remove .. come as response of a stream_auth ( whitout admin=1 , but theres is one near) in the same second.... nearly before... a mount_remove....

so.. it seems to be related to metadata update, and disconnection by no data from the source....

beside that, no listener_remove missing yet

karlheyes commented 12 years ago

A typical non-ogg metadata update does need to authenticate as the update is based on a GET request on /admin/metadata, you should be able to see it on the access log. The &admin=1 is there just so that you know that it's authenticating from /admin requests and not SOURCE.

karl.

Allmoz commented 12 years ago

hum.... action=listener_remove&server=radio.xxxxxxx.com&port=8000&client=4894&mount=/stream&user=source&pass=xxxxxxx&ip=186.66.53.xxx&duration=0&sent=59

all of them are &sent=59

karlheyes commented 12 years ago

sent is supposed to be the bytes sent to that client. It could be down to a stalled stream but the size wouldn't indicate that. I think it may be down to a 64bit issue. Mines ok and that was ok for the prople reporting it, but if you are on a 32bit setup then that may be the cause.

Fix committed for that, but should be no impact otherwise. How are we about the reliability of these being called now?

karl.

Allmoz commented 12 years ago

the send=59 of the wierd listener_remove from the source, is cose normal conecctions to a non sourced stream are send=105.

still missing some ... listener_removes of listener whit poor internet conections, so i was thinking what happens if the listener loose the connection before it gets any response from the server ? also, it was reported that 2 times it asked for user/password. im looking for that first...kind of wield cose they are in the same server.... whith a lighttpd. i recoded to erease all sql updates, to make more responsive... kind of learning..., so DuncanFairley should tell if he still have listener_remove missing

ho, and updating asap

DuncanFairley commented 12 years ago

I'm running Icecast 2.3.2-kh32 and listener_remove has missed a couple of clients with it. I'll update to 2.3.3-kh1 if there's anything in that release that might have an effect on this?

karlheyes commented 12 years ago

The default timeout is 5 seconds which if you are taking that long to respond means you already have problems. If you get to a point where the timeout triggers as the response is returned then icecast will treat it as a failure. The timeout you impose on the url script should be at most, less than the timeout in icecast.

kh33 fixed an issue, 2.3.3-kh1 didn't have anything directly fixing things.

karl.

DuncanFairley commented 12 years ago

I updated to 2.3.3-h1 last night and flushed the tables. I also put in a bunch more logging into my auth script just to make sure it wasn't on my end. This morning, Icecast reports 17 listeners to one of my stations, and my database reports 18. I couldn't find anything fishy in my logs. I checked the icecast logs, and bam: [2012-06-26 05:39:24] WARN auth_url/url_remove_listener auth to server http://myserver.com/scripts/icecast_auth.php failed with Connection time-out

myserver.com being the same server that hosts the icecast instance. Unfortunately I only have my logs set to WARN, because otherwise it would be way, way too chatty to find anything.

The webserver is apache, and isn't resource limited or anything: the time-out happened in one of my periods with the least amount of traffic. Apache isn't reporting anything in its logs either. Thoughts?

karlheyes commented 12 years ago

Can't do much about that situation as I do not keep the remove entries around indefinitely. The connect failure will be at the TCP level, unlikely that apache is not listening on the socket (maybe restarted/reloaded). Could be firewall changes or routing. Not likely to be name lookup for connection timeout or a slow response from the script either.

karl.

DuncanFairley commented 12 years ago

Apache wasn't reloaded/restarted at that time. Routing is accept-all. Apache is handling requests almost every second - if they don't get through, I'd get a nasty email from my logger. This hasn't happened.

karlheyes commented 12 years ago

It's hard to say from just this error from libcurl. It tries and fails to connect within the specified time frame, so either the TCP link failed to establish (ie apache doesn't accept link, very unlikely or network error) or maybe the clock is getting skewed. I don't know exactly how your libcurl implements the timing for this but it's possible that it gets it wrong.and times out early.

karl

karlheyes commented 12 years ago

Shall I take it that kh2 or kh3 is working well.

Allmoz commented 12 years ago

a week ago i keep geting some, (kh2) yesterday i updated (kh3) and changed de configuration of lighttpd, to activate the socket at 127.0.0.1:80 so now icecast is auth to localhost

i will look for missing listener_remove, listener records that have user/password ( it asked auth to the user), and extra listener_remove whit the source data && sent=59

DuncanFairley commented 12 years ago

I've gotten a few with kh2, I'll update to kh3 now and see how it goes.

DuncanFairley commented 12 years ago

listener_remove wasn't called once on one of my stations so far.

Edit: Sorry, 2 stations.*

Allmoz commented 12 years ago

and so far is

31920 listener_add 31947 listener_remove

83 mount_add 83 mount_remove

2820 stream_auth

27 extra listener_remove ( whit source auth information ) 0 missing listener_remove 0 listeners were asked for pass (and actually did put one, when it should never ask for auth)

so far, so good

server updated, and hardLog table, truncated

Allmoz commented 12 years ago

got some duplicate listener_remove

all simultaneus

16 couples

kind of:

duration=28&sent = 0 duration=28&sent = 77

so it looks like:

also.. the &mount= of disconnections is with a timestamp as mount=/stream?1345695320000.mp3 and is should related to fallback ... and mount with no audio data... but not sure how

that plus the 25 extra listener_remove ( whit source auth information ) make the 41 irregulars hardLogs

no listener_remove missing

Allmoz commented 11 years ago

kh(4) almost 5 days running 65049 client connections 33 source connections

no listener_remove missing no listener_remove with source info

another duplicated listener_remove event

16 listeners * 2

0 <= duration <= 30 equal for each pair

one whit sent = 0 the other whit sent = 77

this time there was no fallback stream

...and update to kh5 and truncate table

DuncanFairley commented 11 years ago

I've been running KH3 for nearly 2 months now.

20 mount points. 20 times listener_remove wasn't called, so far, so 20 listening sessions were left in my database. Seems to happen randomly on the different mount points. Nothing of note in the logs.

DuncanFairley commented 11 years ago

Today I kill -HUPed the server, and this happened: http://pastebin.com/uHb38pgB

Somewhat interestingly, 15 listeners didn't have listener_remove called - not 6 (as seen in the log).

Edit: I would have had around 250 listeners on at that time. I ran apache benchmark with 300 concurrent connections, on a PHP script doing the same kind of operation, and it completed all 300 in 1.2 seconds, just fyi.

karlheyes commented 11 years ago

I cannot comment on why the 5 second timeout triggered, it will be either apache or the whatever the php does, having a stall (you should be able to match up attempts with the apache access log). If the timeout triggers, the URL auth is disabled for about 60 seconds, so any further URL requests on that handler are skipped, which would show as missing overall. In theory the clients could get rescheduled for auth again, for retrying after the 60 seconds, but what ever is done is only trying to mask over the failure of the script (or attempt to run the script).

By default, icecast KH will allow 3 parallel url requests per mount auth (handlers option to change it) and so far I'vr not had any reason to suspect libcurl stalling, unless you have a slow DNS or a routing issue. All other stalls I've seen have been down to something in the script.

karl.

DuncanFairley commented 11 years ago

I couldn't find anything in Apache's error log that would indicate any kind of failure with the PHP script. I could understand if there was a problem with my listener_remove script, but I'm guessing 'url_stream_end' refers to mount_remove. My mount_remove script is super basic, so I don't blame the PHP. So I suppose it could be apache itself. I'll try running the auth scripts through nginx to rule apache out.

The machine itself is pretty fast, and is mostly a vanilla Ubuntu 10.04 LTS install. It doesn't ever come close to saturating its CPU or RAM. Would using the machine's loopback address mitigate possible DNS issues?

karlheyes commented 11 years ago

using an IP address will remove any DNS lookup issue, using 127.0.0.1 is even better of course.

the listener_remove looks to of triggered a timeout in what I saw so that would stop any listener based URLs being processed, I think the mount urls are still attempted in such cases, as those are not as frequent. The apaches logs do not usually indicate a failure for such things, remember it may not be a failure, just something running more slowly than normal. With a timeout of 5 seconds, you want to make sure that the script will take no more than say 2 seconds to complete no matter what happens, the 5 second timeout trigger is something that should never happen.

you could try the apache test with 300 with no more than 3-5 at any one time.

karl.

DuncanFairley commented 11 years ago

Okay, so I switched to 127.0.0.1 and reloaded config. I then restarted the server and I received the same timeout issue. (Logs have been sent to your email)

I then placed the listener_remove section from my auth script into a separate PHP file, and wrote it to emulate a normal listener disconnecting. (The script basically does a SELECT and then an UPDATE query.)

I then did: ab -n 300 -c 3 http://localhost/test.php Time taken for tests: 0.092 seconds

I have MySQL set up to log queries that take longer than 2 seconds, and it hasn't logged any.

I think tomorrow I will make the script time itself and log itself to a file, just to be sure. Any other ideas on what I can do to narrow down the cause?