planetteamspeak / ts3phpframework

Modern use-at-will framework that provides individual components to manage TeamSpeak 3 Server instances
https://www.planetteamspeak.com
GNU General Public License v3.0
209 stars 59 forks source link

TransportException: connection to server 'myServer.com:10011' lost #204

Open MatthiasHeinz opened 1 year ago

MatthiasHeinz commented 1 year ago

When calling

$tsServer = // some object containing the server config
$options = array("host" => $tsServer->getHost(), "port" => $tsServer->getQueryPort(), "timeout" => 10, "blocking" => true);
$object = new ServerQuery($options);

the ServerQuery::__destruct() and thus this->request("quit") is being called (c.f. https://github.com/planetteamspeak/ts3phpframework/blob/dev/src/Adapter/ServerQuery.php#L106).

This line used to catch Exception instead of AdapterException (c.f. https://github.com/planetteamspeak/ts3phpframework/blob/035d24265e3604460124ccdad9358ae6c5293b25/libraries/TeamSpeak3/Adapter/ServerQuery.php#L96) which used to catch the TransportException thrown in readLine(...) (c.f. https://github.com/planetteamspeak/ts3phpframework/blob/dev/src/Transport/TCP.php#L162), when calling this->request("quit").

For that reason, the TransportException is thrown and not caught anymore.

MatthiasHeinz commented 1 year ago

2 possible quick resolutions come to mind:

  1. Revert to catching Exception instead of AdapterException in ServerQuery::__destruct().
  2. Add an additional catch-clause in ServerQuery::__destruct() for TransportException.
MatthiasHeinz commented 1 year ago

PR #195 might also resolve this issue.

Sebbo94BY commented 1 year ago

Nice analysis!

Adding a catch (TransportException) { return; } should also solve the issue, yeah.

I would need to double-check this, but it makes sense in theory.

But great. Then we have now two options to solve this issue:

Do you want to create a PR for your mentioned solution @MatthiasHeinz or shall I?

Then we only need to decide, which one the better is. 😅

I'm still waiting for feedback from @ronindesign regarding this (and all other open PRs). 😢

If nobody reviews my PR soon, I can also simply merge them and then everyone can at least test the DEV branch.

ronindesign commented 1 year ago

Appreciate the patience, #195 has been merged. Please reopen this if not resolved!

MatthiasHeinz commented 1 year ago

I'll probably test this next week and give you guys a nudge, if the issue still persists by then; because I'm not allowed to reopen this issue.

MatthiasHeinz commented 1 year ago

Unfortunately, #195 does not seem to resolve the issue entirely; instead the CPU spins in an endless loop.

When calling ServerQuery::__destruct() and thus this->request("quit"), the following loop in ServerQuery.php probably does not terminate anymore, which previously was broken out of by the thrown TransportException.

do {
    $str = $this->getTransport()->readLine();
    $rpl[] = $str;
} while ($str->section(TeamSpeak3::SEPARATOR_CELL) != TeamSpeak3::ERROR);

Now though, $this->getTransport()->readLine() terminates gracefully and thus does not break out of this loop.

Beware: The current issue probably concerns more than just the previously mentioned use-case!

MatthiasHeinz commented 1 year ago

@ronindesign Would you mind to reopen this issue based on my previous post?

Sebbo94BY commented 1 year ago

@MatthiasHeinz I've created a pull request to hopefully fix this issue: #206

Can you please let me know, if this solves the issue for you? I'll also try to reproduce and test this change on my side.

MatthiasHeinz commented 1 year ago

In order to recreate the endless-loop, I simply had to call this custom function (- obviously, you might want to adjust the hostand port):

public function getUserCount() {
    $options = array("host" => "localhost", "port" => "10011", "timeout" => 10, "blocking" => true);
    $object = new ServerQuery($options);
    $node = $object->getHost();
    $node->login($tsServer->getQueryUser(), $tsServer->getQueryPass());
    $node->setExcludeQueryClients(true);
    $node = $node->serverGetByPort($tsServer->getPort());
    assert($node instanceof Server);
    return $node->clientCount();
}

As far as the reasoning goes, when the getUserCount() function returns, the ServerQuery::__destruct() is being called, because the $object variable gets disposed of. Thus triggering the chain of events I've stated in my previous post.

Hope, all of this information helps in your testing efforts.

I'll be testing the fix probably next week, if it's merged by then.

Sebbo94BY commented 1 year ago

I believe we have two problems here:

  1. Loops do not properly break / exit, when the remote server connection is gone: https://github.com/planetteamspeak/ts3phpframework/pull/206/commits/5d86f29df787dedcb18a97d549dd90df324c253b
  2. Running in non-blocking mode (blocking=0) should not disconnect at all except the project code wants it explicitly: https://github.com/planetteamspeak/ts3phpframework/pull/206/commits/19e2e7e3e8f7d9a8bd589618243650f966cb3e29

I will test this further the next days.

Sebbo94BY commented 1 year ago

I was able to reproduce the high CPU utilization. I was also able to fix it with the above merged pull request #206 .

Please let me know, if this also solved the issue for you @MatthiasHeinz . :)

MatthiasHeinz commented 1 year ago

Using $options with "blocking" => false yields an expected CPU utilization. On the other hand using $options with "blocking" => true still yields a blasting CPU fan.

I'm honestly not quite sure, what value I should ideally set blocking to. I just want a short request to the server reading the current user-count. I'm fine with changing the value from trueto false on my end, if that's what I should use anyways.

Just wanted to let you know, that there's still some excessive CPU utilization.

Sebbo94BY commented 1 year ago

@MatthiasHeinz

I'm honestly not quite sure, what value I should ideally set blocking to.

This depends on the use-case, so your PHP script and its purpose.

By default, this option is set to true, which is the standard for a short running PHP script, which connects to your TeamSpeak server, does something (e. g. fetching data) and disconnects afterwards again as the PHP script also exits then.

However, if you build a bot for a TeamSpeak server, this is usually a long running PHP script, which should stay connected 24x7 on your TeamSpeak server for regular tasks (e. g. fetching data, moving clients, granting server groups, etc.). Repeatingly connecting and disconnecting would result in flood messages on your TeamSpeak server (spam the server chat about regular reconnects of this bot). In this case you set the option to false and use the wait() function as described here: https://docs.planetteamspeak.com/ts3/php/framework/index.html#example12

And then your code logic needs to handle the proper disconnect. Otherwise, your client will time out when the PHP script exits.

I just want a short request to the server reading the current user-count.

This can be archived with both explained scenarios above.

Depending on how often you want to fetch this data it maybe makes sense to build a bot for it to stay connected in order to avoid spaming the server chat about the regular reconnects of this client / PHP script.

Just wanted to let you know, that there's still some excessive CPU utilization.

I've tested a few things as normal PHP script and bot PHP script, but I could not reproduce the high CPU utilization anymore after I've applied the above recent changes / fixes.

This can be still caused by your personal code, which I don't know in detail. But your code excerpt above seems to always establish a new connection, so I guess it also closes it all the time. Not sure, what the rest of your code does all the time.

If you're having a class with functions, I would recommend you to create one main function, where you setup your connection and establish it and afterwards you call your respective functions / code, which should do the actual things. AFTER calling all necessary functions, tell your code to disconnect and exit the script.

MatthiasHeinz commented 1 year ago

I'm still trying to fiddle with the previous code snippet.

To debug the underlying issue, I did add a couple of error_log statements (my debugger currently isn't hooked up properly...):

// ServerQuery.php
    public function __destruct()
    {
        error_log("ServerQuery::__destruct 0");
        // do not disconnect, when acting as bot in non-blocking mode
        if (! $this->getTransport()->getConfig("blocking")) {
            return;
        }
        error_log("ServerQuery::__destruct 1");
        if ($this->getTransport() instanceof Transport && $this->transport->isConnected()) {
            try {
                error_log("ServerQuery::__destruct 1.1");
                $this->request("quit");
                error_log("ServerQuery::__destruct 1.2");
            } catch (AdapterException) {
                error_log("ServerQuery::__destruct 1.3");
                return;
            }
        }
        error_log("ServerQuery::__destruct 2");
    }

// Transport.php
    public function __destruct()
    {
        error_log("Transport::__destruct");
        if ($this->adapter instanceof Adapter) {
            $this->adapter->__destruct();
        }

        $this->disconnect();
    }

When running the script once, the apache log yields:

[Tue Sep 05 21:09:03.188126 2023] [php:notice] [pid 9264:tid 1860] [client 127.0.0.1:65192] ServerQuery::__destruct 0, referer: http://localhost/src/public/
[Tue Sep 05 21:09:03.188126 2023] [php:notice] [pid 9264:tid 1860] [client 127.0.0.1:65192] ServerQuery::__destruct 1, referer: http://localhost/src/public/
[Tue Sep 05 21:09:03.188126 2023] [php:notice] [pid 9264:tid 1860] [client 127.0.0.1:65192] ServerQuery::__destruct 1.1, referer: http://localhost/src/public/
[Tue Sep 05 21:09:03.212125 2023] [php:notice] [pid 9264:tid 1860] [client 127.0.0.1:65192] ServerQuery::__destruct 1.2, referer: http://localhost/src/public/
[Tue Sep 05 21:09:03.212125 2023] [php:notice] [pid 9264:tid 1860] [client 127.0.0.1:65192] ServerQuery::__destruct 2, referer: http://localhost/src/public/
[Tue Sep 05 21:09:03.212125 2023] [php:notice] [pid 9264:tid 1860] [client 127.0.0.1:65192] Transport::__destruct, referer: http://localhost/src/public/
[Tue Sep 05 21:09:03.212125 2023] [php:notice] [pid 9264:tid 1860] [client 127.0.0.1:65192] ServerQuery::__destruct 0, referer: http://localhost/src/public/
[Tue Sep 05 21:09:03.212125 2023] [php:notice] [pid 9264:tid 1860] [client 127.0.0.1:65192] ServerQuery::__destruct 1, referer: http://localhost/src/public/
[Tue Sep 05 21:09:03.213126 2023] [php:notice] [pid 9264:tid 1860] [client 127.0.0.1:65192] ServerQuery::__destruct 1.1, referer: http://localhost/src/public/

Note, that the second time ServerQuery::__destruct is called, it does not log the checkpoint "ServerQuery::__destruct 2" and thus there has to be an endless loop!

Combined with the fact, that prior to the second call to ServerQuery::__destruct, (according to the logs) we're calling Transport::__destruct, I'm assuming, that the ServerQuery::destruct triggers a Transport::destruct when being destroyed. But ServerQuery::__destruct also calls $this->request("quit");, which in turn tries to gracefully terminate the connection, thus I'd assume there's some misbehavior.

Hopefully this information helps you to track the underlying issue.

MatthiasHeinz commented 11 months ago

@Sebbo94BY Did you have any chance to take an other look at the endless loop issue? Did you manage to reproduce?

Sebbo94BY commented 11 months ago

Hi @MatthiasHeinz, I had a quick look at it in the past, but since I was on vacation and am currently busy with a few other topics, I had not the time and head to check this further yet.

But I'll definitley take a deeper look at it within the next weeks.

MatthiasHeinz commented 6 months ago

@Sebbo94BY I'm still interested to see a fix for the issue, if possible. Hopefully, your schedule has cleared a bit in the meantime. ;-)