nickoala / telepot

Python framework for Telegram Bot API
MIT License
2.43k stars 474 forks source link

async ChatHandler exceeds aiohttp.ProxyConnector limit #103

Open aderbenev opened 8 years ago

aderbenev commented 8 years ago

After configuring a proxy as described in #85, using the async DelegatorBot skeleton resulted in an issue. Given the proxy connector limit configuration of 10:

telepot.async.api._pools = {
        'default': aiohttp.ProxyConnector(proxy=address, limit=10)
    }

the bot only answers 5 requests and then becomes stuck throwing exceptions on some internal timeout. To test the issue, I was sending a message and allowed ChatHandler to expire before sending another one. I also added some printout to see what's happening:

$ python test.py
Listening ...
Counter opened.
Counter processing message.
Counter closed.
Counter opened.
Counter processing message.
Counter closed.
Counter opened.
Counter processing message.
Counter closed.
Counter opened.
Counter processing message.
Counter closed.
Counter opened.
Counter processing message.
Counter closed.

After answering to five messages and when the next one is supposed to be received from the client, nothing happens on the bot side for some time (approx. 48 seconds), and then exceptions pop up:

Traceback (most recent call last):
  File "/home/user/src/tbot/controlsBot/venv/lib/python3.5/site-packages/telepot/async/api.py", line 107, in request
    async with fn(*args, **kwargs) as r:
  File "/home/user/src/tbot/controlsBot/venv/lib/python3.5/site-packages/aiohttp/client.py", line 539, in __aenter__
    self._resp = yield from self._coro
  File "/home/user/src/tbot/controlsBot/venv/lib/python3.5/site-packages/aiohttp/client.py", line 183, in _request
    conn = yield from self._connector.connect(req)
  File "/home/user/src/tbot/controlsBot/venv/lib/python3.5/site-packages/aiohttp/connector.py", line 300, in connect
    yield from fut
  File "/home/user/src/python3/lib/python3.5/asyncio/futures.py", line 358, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/home/user/src/python3/lib/python3.5/asyncio/tasks.py", line 290, in _wakeup
    future.result()
  File "/home/user/src/python3/lib/python3.5/asyncio/futures.py", line 266, in result
    raise CancelledError
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/src/tbot/controlsBot/venv/lib/python3.5/site-packages/telepot/async/__init__.py", line 250, in get_from_telegram_server
    result = await self.getUpdates(offset=offset, timeout=20)
  File "/home/user/src/tbot/controlsBot/venv/lib/python3.5/site-packages/telepot/async/__init__.py", line 189, in getUpdates
    return await self._api_request('getUpdates', _rectify(p))
  File "/home/user/src/tbot/controlsBot/venv/lib/python3.5/site-packages/telepot/async/__init__.py", line 39, in _api_request
    return await api.request((self._token, method, params, files), **kwargs)
  File "/home/user/src/tbot/controlsBot/venv/lib/python3.5/site-packages/telepot/async/api.py", line 108, in request
    return await _parse(r)
  File "/home/user/src/tbot/controlsBot/venv/lib/python3.5/site-packages/aiohttp/helpers.py", line 488, in __exit__
    raise asyncio.TimeoutError
concurrent.futures._base.TimeoutError

Then, those two tracebacks keep appearing on the same timeout while no messages are processed and no Counter methods are called. The issue also persists even if I don't wait for ChatHandler object timeout to expire.

Since I had some faith in the skeleton code, I opted to play some with configuration of my proxy connectors:

telepot.async.api._pools = {
        'default': aiohttp.ProxyConnector(proxy=address, limit=None)
    }

It seems that removing the limit on ProxyConnector (default behavior) fixes the issue. Although it looks like a working practical solution, the original problem investigation shows that even though ChatHandler on_close is called (and the object is presumably deleted), this does not close the corresponding ProxyConnector connection(s). Even when only one ChatHandler is created, five calls to on_chat_message exhaust the ProxyConnector limit=10, which probably indicates that ProxyConnector connections are not closed nor reused every time messages are received or sent.

Also, making the bot not send anything in return (by commenting out yield from self.sender.sendMessage(self._count) in on_chat_message method) allows the bot to receive 10 messages. It means that one ProxyConnector connection is needed for the bot for each receive or send operation, and because of that with the original skeleton version the limit is 10/2 = 5 messages before the issue occurs.

I don't know much about internal logic of the bot connection handling, but I'd assume that connections should be either closed or reused. For now, any ProxyConnector connection limit (except None) will be eventually exhausted, causing the bot to stuck.

aderbenev commented 8 years ago

Rummaged through the aiohttp docs a little:

Note ProxyConnector in opposite to all other connectors doesn’t support keep-alives by default (force_close is True).

Looks like configuring the connector to keep connections works:

telepot.async.api._pools = {
        'default': aiohttp.ProxyConnector(proxy=address, force_close=False, limit=10)
    }

Meanwhile:

 force_close

    Read-only property, True if connector should ultimately close connections on releasing.

Looks like when force_close=False the connector won't automatically close connections. Should they be closed by the bot explicitly?

nickoala commented 8 years ago

@aderbenev, thanks for laying out the whole story. Fixing the underlying HTTP library's keep-alive issues is my next priority, both for aiohttp and urllib3. I need some time to do my own investigation. Will come back to you when I have more meaningful things to say.

aderbenev commented 8 years ago

I have overcome the issue described by setting the force_close parameter to False while leaving limit=10 for aiohttp.ProxyConnector. Unfortunately, I can not reliably test the bot function with multiple simultaneous client connections.

While troubleshooting the related connectivity issues for aiohttp, I stumbled upon reconnect problem on network failure described here. The way it affected telepot for me, in request function I had to add some handling for asyncio.TimeoutError:

In async/api.py (now aio/api.py):

async def request(req, **user_kw):
    global _pools
    fn, args, kwargs, timeout = _transform(req, **user_kw)

    if timeout is None:
        async with fn(*args, **kwargs) as r:
            return await _parse(r)
    else:
        try:
            with aiohttp.Timeout(timeout):
                async with fn(*args, **kwargs) as r:
                    return await _parse(r)
        except asyncio.TimeoutError:
            # aiohttp request timeout. Connection pool is flushed.
            for key in _pools:
                await _pools[key].close()
            _pools = None
            raise

That is, on timeout I explicitly close all connections and later construct a new ProxyConnector object in _transform function as needed. If I don't do this, any sporadic connectivity issue "breaks" the connector and all further requests never complete even when network comes back up, yielding only an endless exception loop.

nickoala commented 8 years ago

@aderbenev, first I want to thank you for all the troubleshooting. It exposes a bunch of issues I have not considered.

Before I go any further, I want to be certain about the necessity of your fix (on timeout, explicitly closing all connections and later construct a new ProxyConnector to avoid sporadic connectivity issues break the connector irreversibly). Is this fix only needed for ProxyConnector, or all types of connectors?

Your comments make me think that maybe I have assumed too much intelligence in aiohttp's connection pooling (or maybe it's my ignorance). But I will listen to your answer before saying any more.

aderbenev commented 8 years ago

@nickoala, sadly I can't provide a comprehensive suite for other scenarios (e.g. different connectors) since my setup is running behind the proxy I cannot bypass. That is, using another connector would not let the bot to connect at all. Also, I don't know at which point the connection is broken in my case, and recreating this particular issue deliberately is not simple.

Alas, judging by the scary exception trace my best bet was to assume that if the connection is terminated prematurely, the connector becomes broken for good.

nickoala commented 8 years ago

@aderbenev, thank you.

Before telepot, I have not used any library's connection pooling facilities. But when I think about connection pooling, I envision these functionalities (and assume any libraries that support connection pooling to have them):

It creates/re-uses/fixes connections as needed, transparently, within the connection limit. If a connection is idle for some (specified) time, it is closed. If a connection is broken, a new connection is tried.

As a user, all I care about is making HTTP requests efficiently. I don't want to manage connections. Apparently, it is not as ideal as I imagine.

Your situation has another dimension: Is it only a property of ProxyConnector, or all types of connectors? You don't know, and I don't know. I said the following in another post. I'd say it again:


Given the unstability after implementing connection pooling (on both urllib3 and aiohttp) and the sporadic nature of those problems (unable or hard to reproduce), I am inclined to dump connection pooling. I may still keep the logic in telepot.api and telepot.aio.api, but configure it to use an independent session for each call (like the "old days"). I'd rather it be dumb/slow/inefficient than unstable. Anyone interested in connection pooling may configure it to do so, and bear the consequences.

I don't want to be bogged down by these issues, while having other improvements to make.


Basically, I just want to move on. Callback query handling is yet to complete. I want to move on and complete that first .....