jykob / TSBot

Asynchronous framework to build TeamSpeak 3 Server Query bots
https://tsbot.readthedocs.io
MIT License
6 stars 3 forks source link

Timeout error. #37

Closed deprale closed 4 months ago

deprale commented 4 months ago
Task exception was never retrieved
future: <Task finished name='Task-107' coro=<TSBot._send() done, defined at /usr/local/lib/python                                              3.10/dist-packages/tsbot/bot.py:243> exception=TimeoutError()>
Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
    return fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tsbot/bot.py", line 277, in _send
    server_response = await asyncio.wait_for(asyncio.shield(self._response), 2.0)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
    raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError

Randomly happens after a certain time has passed when bot is just idling on the server. I'm still using 0.18.0 but I see no changes in the functions mentioned in the error log.

jykob commented 4 months ago

Do you happen to have any blocking code that could run at the same time as the query is being sent?

For example:

@bot.on("clientmoved")
async def test_1(bot: TSBot, ctx: TSCtx):
    print("senging")
    # ignore the direct call to _send(), this makes it easier to reproduce the issue
    await bot._send(query("sendtextmessage").params(targetmode=3, msg="asd").compile())

@bot.on("clientmoved")
async def test_2(bot: TSBot, ctx: TSCtx):
    print("sleeping")
    time.sleep(5)  # time.sleep is a blocking call
    print("done")

In this example when a client moves, test_1 and test_2 are called concurrently. test_1 sends a query and awaits it. meanwhile test_2 calls a blocking function that wont allow the event loop to run. When the blocking function returns, the event loop starts running again. But now time has passed and inside the _send() call await asyncio.wait_for sees that it has passed its 2.0 seconds timeout period and raises CancelledError.

To avoid this, all blocking functions should run in a separate thread (via asyncio.to_thread()) or have async alternative version of such calls.

jykob commented 4 months ago

Scratch above. Did some more digging:

DEBUG:tsbot.default_plugins.keepalive:Sending keep-alive
DEBUG:tsbot.bot:Sending query: 'version'
DEBUG:asyncssh:[conn=0] Requesting key exchange
DEBUG:tsbot.events.handler:Got event: TSEvent(event='send', ctx={'query': 'version'})       
DEBUG:asyncssh:[conn=0] Received key exchange request
DEBUG:asyncssh:[conn=0] Beginning key exchange
INFO:asyncssh:[conn=0] Connection lost
INFO:asyncssh:[conn=0, chan=0] Closing channel due to connection close
INFO:asyncssh:[conn=0, chan=0] Channel closed: Connection lost

It seems there is a condition where the KeepAlive plugin sends a keep-alive query to the server while the ssh key exchange is going on. This would sever the connection and since no connection, the bot would close.

Need to do futher investigation on this.

jykob commented 4 months ago

Further investigation: Inside connection.py, configuring low key exchange period when creating the ssh connection helps to reproduce this bug.

self._connection = await asyncssh.connect(
    self.address,
    port=self.port,
    username=self.username,
    password=self.password,
    known_hosts=None,
    rekey_seconds=10,  # Low key exchange
)

This forces frequent key exchanges by the client. When the bot tries to send a query, the ssh client tries to initiate KEX and fails, killing the connection. I dont think this use to be an issue few years ago, maybe something changed upstream (AsyncSSH)

jykob commented 4 months ago

Update:

After testing with other ssh clients, it seems like TeamSpeak SSH implementation doesn't handle rekeying the connection very well.

The problem with AsyncSSH is that the default rekey_seconds is set 1 hour. After an hour, key exchange would happen and kill the connection when sending data.

Solutions:

Option 1:

self._connection = await asyncssh.connect(
    self.address,
    port=self.port,
    username=self.username,
    password=self.password,
    known_hosts=None,
    rekey_seconds=False,  # Don't rekey after a given time
    rekey_bytes="1000g",  # High number, rekey after 1000GB of data sent
)

I dont't think there is a way to disable rekeying after the bytelimit is hit. This solution is not desirable, because of the bytes limit. it might not hit the limit soon but at somepoint.

Option 2:

Subclass AsyncSSH.SSHConnection to patch out rekeying.

deprale commented 4 months ago

Truly seems like a ghetto fix, but nonetheless can also just use systemd to restart the script I guess for now, along with the 1000g rekeying.

Maybe it's worthy to report on TS3 Forums?

jykob commented 4 months ago

This seems like a TeamSpeak bug to me. I will try to ask around in the TeamSpeak community forums once i get to it. AsyncSSH (or any other client i've tried) is not the problem here since it's complying with SSH standards. (rfc4253#section-9, rekey after 1h, 1GB data)

jykob commented 4 months ago

Downgrading to AsyncSSH 2.14.0 seems to fix the issue with rekeying.

pip install asyncssh==2.14.0

Going to push a patch pinning asyncssh version.

AsyncSSH versions affected:

I havent yet tried to find why those versions causes issues. Interestingly enough other SSH clients also fail rekeying with the Teamspeak server. (PuTTy, Bitvise SSH client), although OpenSSH client works great.