karlheyes / icecast-kh

KH branch of icecast
GNU General Public License v2.0
298 stars 107 forks source link

Timeout followed by sigterm - when stopping Icecast using url_auth mount_remove #386

Closed glitch59 closed 1 year ago

glitch59 commented 1 year ago

Hi Karl,

Been testing the latest version and I have found another issue. Not critical, but it does needs some attention. Here we go:

Stopping Icecast takes long time and ends with killing the icecast process, when sources, that are connected during the shutdown, are using url_auth with the mount_remove section added.

The following data is logged in the error log, while stopping the server:

[2023-01-18  16:14:07] WARN auth_url/url_stream_end auth to server http://endpoint.com/mount/remove/39262 (/mount1.ogg) failed with Operation timed out after 5001 milliseconds with 0 bytes received
[2023-01-18  16:14:07] WARN auth_url/url_stream_end auth to server http://endpoint.com/mount/remove/40163 (/mount2.ogg) failed with Operation timed out after 5001 milliseconds with 0 bytes received
[2023-01-18  16:14:07] WARN auth_url/url_stream_end auth to server http://endpoint.com/mount/remove/8522 (/mount3.ogg) failed with Operation timed out after 5001 milliseconds with 0 bytes received

The endpoint.com URLs exist and are called when a mountpoint is removed (mount_remove). They are reachable and also function normally when called during normal operation. But when stopping the server, they apparently seem unreachable.

The server is stopped using: root@server:/tmp# service icecast stop

Additional info:

root@server:/tmp# service icecast status
● icecast.service - Icecast server
     Loaded: loaded (/lib/systemd/system/icecast.service; enabled; vendor preset: enabled)
     Active: failed (Result: timeout) since Wed 2023-01-18 16:15:32 CET; 2s ago
    Process: 4191711 ExecStart=/usr/local/bin/icecast -b -c /usr/local/etc/icecast.xml (code=exited, status=0/SUCCESS)
    Process: 4192599 ExecStop=/bin/kill $MAINPID (code=exited, status=0/SUCCESS)
   Main PID: 4191712 (code=killed, signal=KILL)
        CPU: 5.961s

Jan 18 16:06:15 theserver.com icecast[4191712]: Changed supplementary groups based on user: icecast2.
Jan 18 16:06:15 theserver.com icecast[4191712]: Changed groupid to 5001.
Jan 18 16:06:15 theserver.com icecast[4191712]: Changed userid to 5001.
Jan 18 16:14:02 theserver.com systemd[1]: Stopping Icecast server...
Jan 18 16:15:32 theserver.com systemd[1]: icecast.service: State 'stop-sigterm' timed out. Killing.
Jan 18 16:15:32 theserver.com systemd[1]: icecast.service: Killing process 4191712 (icecast) with signal SIGKILL.
Jan 18 16:15:32 theserver.com systemd[1]: icecast.service: Main process exited, code=killed, status=9/KILL
Jan 18 16:15:32 theserver.com systemd[1]: icecast.service: Failed with result 'timeout'.
Jan 18 16:15:32 theserver.com systemd[1]: Stopped Icecast server.
Jan 18 16:15:32 theserver.com systemd[1]: icecast.service: Consumed 5.961s CPU time.

Takes at least 90 seconds before server is killed as a result of a timeout.

Best regards, Sven

karlheyes commented 1 year ago

hmm, how many sources are connected? Are you using chroot ? Will check for any lock issues in that code path

karl

glitch59 commented 1 year ago

In this case there where 3 sources connected. Not using chroot.

XML code snippet:

<security>
    <chroot>0</chroot>
    <changeowner>
        <user>icecast2</user>
        <group>icecast</group>
    </changeowner>
</security>

<mount type="normal">
    <mount-name>/mount22.ogg</mount-name>
    <stream-name>MyFancyName</stream-name>
    <stream-url>http://endpoint.com/profile/mount22</stream-url>
    <password>hackme</password>
    <dump-file>/some/dir/dump.ogg</dump-file>
    <authentication type="url">
        <option name="allow_duplicate_users" value="0"/>
        <option name="realm" value="MyRealmName"/>
        <option name="mount_add" value="http://endpoint.com/mount/add/22"/>
        <option name="mount_remove" value="http://endpoint.com/mount/remove/22"/>
        <option name="listener_add" value="http://endpoint.com/listener/add/22"/>
        <option name="listener_remove" value="http://endpoint.com/listener/remove/22"/>
    </authentication>
</mount>
karlheyes commented 1 year ago

can you try a one-liner to see if the stalling point is where I think it is auth_url.c :178 to read

if (bytes <= 1 || client == NULL || client->respcode)

rebuild, retry. I suspect it is connecting but the 5 seconds is the overall request timeout.

karl

glitch59 commented 1 year ago

Just tested this locally and with auth_url.c :178 replaced as you suggested, the issue looks resolved.

Will test it in production tomorrow, when nobody is connected.

Thanks for the help! Sven

glitch59 commented 1 year ago

No more timeouts, but it does log the following error:

[2023-01-18  18:32:18] INFO main/server_process Shutting down
[2023-01-18  18:32:22] WARN auth_url/url_stream_end auth to server http://endpoint.com/mount/remove/1 (/mount1.ogg) failed with Operation timed out after 5001 milliseconds with 0 bytes received
[2023-01-18  18:32:22] INFO auth/auth_shutdown Auth shutdown complete
karlheyes commented 1 year ago

what does that script produce back to icecast, is there a body in the response etc. The reason I ask is that the timeout is actually from libcurl, 5 seconds being the overall timeout, connection timeout is 3 seconds and that change you did essentially returns very quickly any headers being passed. So something else is delaying the response somewhere.

karl

glitch59 commented 1 year ago

Thanks for pointing this out.

Turns out the mount_remove script makes a request to the Icecast server before processing the disconnect. But since the server is shutting down, it is very likely that there is no reaction. So that results in the timeout.

I will add some extra validations to the script to see if that solves the issue.

glitch59 commented 1 year ago

Improvements to the mount_remove script solved the timeout problem. This issue is resolved and can be closed. Thanks for the support! Sven.