Open aderbenev opened 8 years ago
Rummaged through the aiohttp
docs a little:
Note ProxyConnector in opposite to all other connectors doesn’t support keep-alives by default (force_close is True).
Looks like configuring the connector to keep connections works:
telepot.async.api._pools = {
'default': aiohttp.ProxyConnector(proxy=address, force_close=False, limit=10)
}
Meanwhile:
force_close
Read-only property, True if connector should ultimately close connections on releasing.
Looks like when force_close=False
the connector won't automatically close connections. Should they be closed by the bot explicitly?
@aderbenev, thanks for laying out the whole story. Fixing the underlying HTTP library's keep-alive issues is my next priority, both for aiohttp
and urllib3
. I need some time to do my own investigation. Will come back to you when I have more meaningful things to say.
I have overcome the issue described by setting the force_close
parameter to False
while leaving limit=10
for aiohttp.ProxyConnector
. Unfortunately, I can not reliably test the bot function with multiple simultaneous client connections.
While troubleshooting the related connectivity issues for aiohttp, I stumbled upon reconnect problem on network failure described here. The way it affected telepot for me, in request function I had to add some handling for asyncio.TimeoutError
:
In async/api.py (now aio/api.py):
async def request(req, **user_kw):
global _pools
fn, args, kwargs, timeout = _transform(req, **user_kw)
if timeout is None:
async with fn(*args, **kwargs) as r:
return await _parse(r)
else:
try:
with aiohttp.Timeout(timeout):
async with fn(*args, **kwargs) as r:
return await _parse(r)
except asyncio.TimeoutError:
# aiohttp request timeout. Connection pool is flushed.
for key in _pools:
await _pools[key].close()
_pools = None
raise
That is, on timeout I explicitly close all connections and later construct a new ProxyConnector object in _transform
function as needed. If I don't do this, any sporadic connectivity issue "breaks" the connector and all further requests never complete even when network comes back up, yielding only an endless exception loop.
@aderbenev, first I want to thank you for all the troubleshooting. It exposes a bunch of issues I have not considered.
Before I go any further, I want to be certain about the necessity of your fix (on timeout, explicitly closing all connections and later construct a new ProxyConnector to avoid sporadic connectivity issues break the connector irreversibly). Is this fix only needed for ProxyConnector, or all types of connectors?
Your comments make me think that maybe I have assumed too much intelligence in aiohttp
's connection pooling (or maybe it's my ignorance). But I will listen to your answer before saying any more.
@nickoala, sadly I can't provide a comprehensive suite for other scenarios (e.g. different connectors) since my setup is running behind the proxy I cannot bypass. That is, using another connector would not let the bot to connect at all. Also, I don't know at which point the connection is broken in my case, and recreating this particular issue deliberately is not simple.
Alas, judging by the scary exception trace my best bet was to assume that if the connection is terminated prematurely, the connector becomes broken for good.
@aderbenev, thank you.
Before telepot, I have not used any library's connection pooling facilities. But when I think about connection pooling, I envision these functionalities (and assume any libraries that support connection pooling to have them):
As a user, all I care about is making HTTP requests efficiently. I don't want to manage connections. Apparently, it is not as ideal as I imagine.
Your situation has another dimension: Is it only a property of ProxyConnector
, or all types of connectors? You don't know, and I don't know. I said the following in another post. I'd say it again:
Given the unstability after implementing connection pooling (on both urllib3
and aiohttp
) and the sporadic nature of those problems (unable or hard to reproduce), I am inclined to dump connection pooling. I may still keep the logic in telepot.api
and telepot.aio.api
, but configure it to use an independent session for each call (like the "old days"). I'd rather it be dumb/slow/inefficient than unstable. Anyone interested in connection pooling may configure it to do so, and bear the consequences.
I don't want to be bogged down by these issues, while having other improvements to make.
Basically, I just want to move on. Callback query handling is yet to complete. I want to move on and complete that first .....
After configuring a proxy as described in #85, using the async DelegatorBot skeleton resulted in an issue. Given the proxy connector limit configuration of 10:
the bot only answers 5 requests and then becomes stuck throwing exceptions on some internal timeout. To test the issue, I was sending a message and allowed ChatHandler to expire before sending another one. I also added some printout to see what's happening:
After answering to five messages and when the next one is supposed to be received from the client, nothing happens on the bot side for some time (approx. 48 seconds), and then exceptions pop up:
Then, those two tracebacks keep appearing on the same timeout while no messages are processed and no Counter methods are called. The issue also persists even if I don't wait for ChatHandler object timeout to expire.
Since I had some faith in the skeleton code, I opted to play some with configuration of my proxy connectors:
It seems that removing the limit on ProxyConnector (default behavior) fixes the issue. Although it looks like a working practical solution, the original problem investigation shows that even though ChatHandler on_close is called (and the object is presumably deleted), this does not close the corresponding ProxyConnector connection(s). Even when only one ChatHandler is created, five calls to
on_chat_message
exhaust the ProxyConnectorlimit=10
, which probably indicates that ProxyConnector connections are not closed nor reused every time messages are received or sent.Also, making the bot not send anything in return (by commenting out
yield from self.sender.sendMessage(self._count)
inon_chat_message
method) allows the bot to receive 10 messages. It means that one ProxyConnector connection is needed for the bot for each receive or send operation, and because of that with the original skeleton version the limit is 10/2 = 5 messages before the issue occurs.I don't know much about internal logic of the bot connection handling, but I'd assume that connections should be either closed or reused. For now, any ProxyConnector connection limit (except
None
) will be eventually exhausted, causing the bot to stuck.