Fix pool drain - Githubissues

geraldog commented 9 months ago

As I stated previously I have serious problems with the pool draining very quickly. I only tested with CyclicQueuePool,

I erroneously and naively tried to fix this with #452

It turns out the problem wasn't just an one-liner. The problem starts with create_task in del inside HttpResponse class. This task has no guarantee of executing and as soon del exits the task is most likely instantaneously garbage collected.

So I discovered I could wait inside Connection class aexit function. This alleviated the problem but there was still fast pool drain.

I tried a little more and it became obvious the wait_for call to _do_request helper function was cancelling aexit of Connection class. But I was able to catch CancelledError there so there's guarantee that we can close the socket by using close() on the StreamWriter and finally putting the Connection back into the CyclicQueuePool class Queue.

There was still pool drain so I fixed two more situations inside Connector class where Exceptions happen and the Connection is not put back into the Queue.

Now my crawler is running stable without pool drain! Thank you for the work!

geraldog commented 9 months ago

After fixing pool drain I noticed that a particular fast crawler (courtesy of aiosonic :) of mine was raising OSError related to too many open files. Strange I thought, since I ulimit -n 16000 on that terminal and have a custom security.conf.

It turns out some bad SSL servers leave the connection half-open and the only solution to close the zombie socket is to abort the transport. I tested and it solves the problem, so I'll push the relevant commit with this fix in an instant.

Note that for Python 3.11 it requires this https://github.com/python/cpython/pull/104485

Also there's a memory leak in Python 3.11+ regarding, again SSL, see https://github.com/python/cpython/issues/109534

The memory leak itself is not relevant to the commit, just though I'd share since it particularly limits long-winded crawls.

geraldog commented 9 months ago

@sonic182 I think we can create a Future object that can be awaited in aexit if self.blocked without the ugly while True:

Let's call this Future object the Connection waiter. A Future is just a promise of a resolution of a problem, that can be awaited and it will yield to other tasks if not yet done, without the need for the also ugly asyncio.sleep(hardcoded_timeout)

The beauty of this Future object is that its set_result() method can be called from synchronous code paths.

So we await on this waiter if Connection is self.blocked and we get the chance to catch CancelledError either way when RequestTimeout() comes.

I have a test crawling running right now that I would like not to interrupt but I think I can hack up some code for tomorrow. What do you think?

sonic182 commented 9 months ago

I'm working in this small PR https://github.com/sonic182/aiosonic/pull/454 to include socket read/write in the same connection class (this will be needed for future usage when fully implement the Proxy)

Maybe the try/finally inside the HTTPResponse class fixes your crawler? by ensuring the connection get's released after reading response data or having an error with it

geraldog commented 9 months ago

I'll definitely give it a try, thanks!

EDIT: I see it's merged already so I'll be definitely be giving it a try today still.

geraldog commented 9 months ago

Unfortunately I see ConnectionPoolAcquireTimeout very fast even with latest master that include #454 :(

sonic182 commented 9 months ago

ConnectionPoolAcquireTimeout

You may need to use Semaphores to limit the concurrency in your crawler, I think you're starting a lot of calls very quick and the pool size is not big enough for it, you may need to extend timeouts too

geraldog commented 9 months ago

Please re-read the first message in this pull request and you'll realize the problem is not as you mentioned.

EDIT: By the way my pool size is 16000 clients, which I allow by custom limits.conf and ulimit -n 16000. The crawler does not create more than 1000 tasks, limited by len(asyncio.all_tasks())

How could my Pool be exhausting? And how does it not exhaust with the changes introduced in this PR?

geraldog commented 9 months ago

@sonic182 taking in account #454 going in effect and my promise to rework the logic around a future to wait instead of using asyncio.sleep, I'm closing this PR and opening another one. Hope you don't mind.

sonic182 / aiosonic

Fix pool drain #453