python-websockets / websockets

Library for building WebSocket servers and clients in Python
https://websockets.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
5.22k stars 518 forks source link

Connecting to BitMEX realtime api, each connection creates a thread which is never released #353

Closed andersea closed 6 years ago

andersea commented 6 years ago

Hi, I am writing a connector for the BitMEX crypto exchange api.

Each time I connect with websockets.client.connect a new thread is created. The thread seems to stay up even if the connection is closed.

For example this code:

import asyncio
import websockets

async def main():
    async with websockets.client.connect('wss://www.bitmex.com/realtime') as ws:
        print(await ws.recv())
    await asyncio.sleep(5)
    async with websockets.client.connect('wss://www.bitmex.com/realtime') as ws:
        print(await ws.recv())
    await asyncio.sleep(5)
    async with websockets.client.connect('wss://www.bitmex.com/realtime') as ws:
        print(await ws.recv())
    await asyncio.sleep(5)
    async with websockets.client.connect('wss://www.bitmex.com/realtime') as ws:
        print(await ws.recv())
    await asyncio.sleep(30)

asyncio.get_event_loop().run_until_complete(main())

This code creates four threads that stay up forever.

The way I found out about this is that in some cases the websocket will close spontaneously, due to network or server errors, in these cases the recommended approach is to simply open a new connection and resubscribe to the channels that was subscribed on the new connection. But even though the connection is closed, the thread remains and there doesn't seem to be a way that I can see to clean it up.

After a day of operation I end up with 30 stale threads that just hangs around doing nothing.

Is this intended? Is there any way to clean up threads from closed connections?

aaugustin commented 6 years ago

I'm not sure what you're talking about because websockets doesn't use threads.

What's a "stale thread" and how are you seeing them?

andersea commented 6 years ago

They are visible in the python debugger in VSCode.

Here is a screenshot: billede

As you can see in the call stack box on the bottom left, each started connection initiates a thread. The thread remains even though you close the connection.

I am sure websockets aren't creating them, they are created by some code in the event loop.

It is possible this is just a debugger issue. I can't rule that out. Or maybe I am just not understanding what the debugger is telling me. The debugger isn't really too fond of async/await code.

When I had my production code running for a day, and I had had a whole bunch of disconnects, I had a lot of threads allocated.

aaugustin commented 6 years ago

If you're talking of actual OS threads, I'm positive that websockets isn't creating them.

If these are actually asyncio tasks, websockets might be leaking them, which would be a bug.

websockets creates two asyncio tasks for each connection, as explained here: http://websockets.readthedocs.io/en/stable/design.html#coroutines

I'm not sure what VS Code is telling you here :-/

andersea commented 6 years ago

The event loop I am using is UnixSelectorEventLoop.

The thread is created on line 719 in class BaseEventLoop(AbstractEventLoop) in the python source code file base_events.py

The code looks like this:

        f1 = _ensure_resolved((host, port), family=family,
                              type=socket.SOCK_STREAM, proto=proto,
                              flags=flags, loop=self)
        fs = [f1]

If you put a breakpoint on the f1 = _ensure_resolved line, there is no thread. When you continue to the next line the thread appears.

I am fairly certain the threads are real os threads. This is the thread list of the debugging process. One of the threads is the debugger. The second is the main application thread and the last four are the socket threads.

andersa@Kubuntu-N73JF:~/python/wstest$ ps -T -p 6103
  PID  SPID TTY          TIME CMD
 6103  6103 ?        00:00:02 python
 6103  6104 ?        00:00:00 python
 6103  6128 ?        00:00:00 python
 6103  6132 ?        00:00:00 python
 6103  6134 ?        00:00:00 python
 6103  6135 ?        00:00:00 python
andersa@Kubuntu-N73JF:~/python/wstest$

I think the threads are needed. They are used by the event loop to monitor the raw socket, but I also think they ought to be cleaned up somehow. I don't know enough about socket programming to be able to tell you how.

aaugustin commented 6 years ago

Okay, the threads are created by asyncio... Interesting.

Perhaps you could try putting together a minimal reproduction example, without websockets, only asyncio — call create_connection then close the connection and show there's a thread hanging around — and report that bug in the Python bug tracker?

andersea commented 6 years ago

Ok. I'll try that. Thanks.

RemiCardona commented 6 years ago

getaddrinfo(3) is a blocking syscall so python does https://github.com/python/cpython/blob/3.6/Lib/asyncio/base_events.py#L675 (same thing since 3.4, haven't looked at 3.7 but I would assume it's the same). So the threads should belong to the loop's executor. HTH

aaugustin commented 6 years ago

https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor says the number of threads should never go beyond 5 x CPU count so there's no unlimited resource leak.

andersea commented 6 years ago

Thats plausible, although it seems I am accumulating slightly more threads than this. This might be explainable by other libraries I am using.

Thanks for the help.