tornadoweb / tornado

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
http://www.tornadoweb.org/
Apache License 2.0
21.76k stars 5.51k forks source link

Unbounded growth in number of sockets when using AsyncHTTPClient #3424

Closed e-carlin closed 1 month ago

e-carlin commented 1 month ago

The below minimal example reproduces the problem. If you run while true; do ss | wc -l; sleep 1; done while the python program is running you'll see unbounded growth in the number of sockets. If you let it run for long enough you will run out of file descriptors.

import asyncio
import time
import tornado.httpclient

class Client:
    def __init__(self):
        # force_instance=True|False doesn't change behavior
        self._client = tornado.httpclient.AsyncHTTPClient(force_instance=True)

    async def call(self):
        r = await self._client.fetch("https://google.com")
        self._client.close()  # Close or not: doesn't change behavior

while True:
    asyncio.run(Client().call())
    time.sleep(1)

Any insight on what might be causing this?

I realize that running asyncio.run() repeatedly is a bit unconventional. Our real system doesn't do this exactly but it calls asyncio.run as a means to bridge between a non-async application and an async library. Also, I understand that there is some "magic" behind AsyncHTTPClient being a singleton. I'm curious whether that behavior is causing the unbounded growth. I took a stab at debugging down in AsyncHTTPClient but didn't get very far.

Thanks!

yelinaung commented 1 month ago

Is there any reason to do asyncio.run(..) in the while loop every 1 sec ? Because if you do that, this is effectively creating a new ioloop each time, and subsequently creating new AsyncHTTPClient instances (the "singleton cache" is based on per each ioloop) That's how I understand how it works in Tornado.

So, if we tweak the script to something like this,

class Client:
    def __init__(self):
        self._client = tornado.httpclient.AsyncHTTPClient()

    async def call(self):
        r = await self._client.fetch("https://www.google.com")
        print(f"Client Result - {r.code}")
        # we should not close it
        # self._client.close()

async def main():
    client = Client()
    while True:
        await client.call()
        time.sleep(1)  # better to use asyncio.sleep instead

asyncio.run(main())

this will not grow the sockets.

e-carlin commented 1 month ago

Thanks @yelinaung. Answers below:

Is there any reason to do asyncio.run(..) in the while loop every 1 sec ?

This is to simulate real life. For the code I shared I just wanted to reproduce the behavior I'm seeing in as few lines as possible.

In real life: based on user interaction the ioloop is created, a request is made, and the response is returned to the user. The reason we have to do this is our app is not async. But, it is calling an async library which makes the request. So, we need to bridge between the two worlds with asyncio.run

So, if we tweak the script to something like this...this will not grow the sockets.

Thanks. There are quite a few ways to tweak the code to eliminate the problem. This is how we've solved it for the time being.

My point in opening the issue is that it is a foot gun (one we shot ourselves with) that the code I shared above causes unbounded growth in the number of sockets. IMHO it is not obvious just by reading the code to tell there is a resource leak. People probably won't find out until (like us) they run out of file descriptors.

bdarnell commented 1 month ago

The issue here is that when you give asyncio.run a complex expression like asyncio.run(Client().call()), the call method runs inside asyncio.run, while the Client() constructor runs before it (this is normal python function call behavior but it's confusing with the way asyncio.run wants to act like a container or wrapper). This means that the two functions see different event loops. Constructing the AsyncHTTPClient inside Client() implicitly initializes a separate event loop that will immediately be overwritten by asyncio.run() without being cleaned up, hence the leak (this event loop confusion can lead to worse problems than a file descriptor leak, although we don't see any in this example).

The fix, then, is not about holding references (or not) to AsyncHTTPClient, but about whether it's created in a constructor that runs outside asyncio.run or if it's initialized lazily inside the proper event loop context (this is why doing anything but one big asyncio.run call at the top of the program is risky). A simple change to your example that will work is to move the Client().call() expression into a wrapper function:

while True:
    async def wrapper():
        await Client().call()
    asyncio.run(wrapper())
    time.sleep(1)

This moves the constructor into the asyncio.run context so it gets the right event loop (applying this to your full application might be more complicated).

As you say, this is a non-obvious way to introduce subtle errors. Work is underway to change the behavior so that this will fail noisily instead of quietly, but it has to go through a long deprecation cycle because it's backwards-incompatible with patterns that were used and recommended in the early years of asyncio (prior to the introduction of asyncio.run). If you enable deprecation warnings you'll get a warning whenever you touch anything asyncio-related outside of the asyncio.run context.

bdarnell commented 1 month ago

This deprecation process started in Python 3.10 but the DeprecationWarning won't be converted to an error by default until at least Python 3.14, according to https://github.com/python/cpython/issues/93453 (I'm not sure if there's anything more recent).

e-carlin commented 1 month ago

Ah, of course. Glaringly obvious now that you point it out. Sorry to waste your time on something not really Tornado related. Thank you for the very detailed response!