kyuupichan / aiorpcX

Generic async RPC implementation, including JSON-RPC
MIT License
27 stars 23 forks source link

Task cancellation of SOCKS sessions kills whole event loop #8

Closed SomberNight closed 6 years ago

SomberNight commented 6 years ago

So my example is on Windows, as I am having a hard time making a minimal example on Linux... but there is also a problem on Linux, even if not exactly the same.

code:

import asyncio
import ssl
import threading

import aiorpcx
from aiorpcx import ClientSession, TaskGroup

async def open_session():
    cert_path = r"C:\Users\User\AppData\Roaming\Electrum\testnet\certs\testnet.qtornado.com"
    sslc = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=cert_path)
    sslc.check_hostname = 0
    proxy = aiorpcx.socks.SOCKSProxy(("127.0.0.1", 9150), aiorpcx.socks.SOCKS5, None)
    s = ClientSession("testnet.qtornado.com", 51002, ssl=sslc, proxy=proxy)
    async with s as session:
        ver = await session.send_request('server.version', ["test", "1.4"])
        print(ver)

async def kill_it():
    await asyncio.sleep(0.1)
    raise Exception("bye")

async def main():
    async with TaskGroup() as group:
        await group.spawn(open_session)
        await group.spawn(kill_it)

loop = asyncio.get_event_loop()
t=threading.Thread(target=loop.run_until_complete, args=(main(),))
t.start()

result:

>>> Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 455, in run_until_complete
    self.run_forever()
  File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 422, in run_forever
    self._run_once()
  File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 1398, in _run_once
    event_list = self._selector.select(timeout)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\selectors.py", line 323, in select
    r, w, _ = self._select(self._readers, self._writers, [], timeout)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\selectors.py", line 314, in _select
    r, w, x = select.select(r, w, w, timeout)
OSError: [WinError 10038] An operation was attempted on something that is not a socket

>>>
>>> loop
<_WindowsSelectorEventLoop running=False closed=False debug=False>
>>>

Now, this example is not enough to reproduce it on Linux; but there the loop does NOT die, but the following trace appears:

Exception in callback BaseSelectorEventLoop._sock_recv(<Future cancelled>, True, <socket.socke...2049, proto=0>, 5)
handle: <Handle BaseSelectorEventLoop._sock_recv(<Future cancelled>, True, <socket.socke...2049, proto=0>, 5)>
Traceback (most recent call last):
  File "/usr/lib/python3.6/asyncio/events.py", line 127, in _run
    self._callback(*self._args)
  File "/usr/lib/python3.6/asyncio/selector_events.py", line 378, in _sock_recv
    self.remove_reader(fd)
  File "/usr/lib/python3.6/asyncio/selector_events.py", line 342, in remove_reader
    return self._remove_reader(fd)
  File "/usr/lib/python3.6/asyncio/selector_events.py", line 279, in _remove_reader
    key = self._selector.get_key(fd)
  File "/usr/lib/python3.6/selectors.py", line 189, in get_key
    return mapping[fileobj]
  File "/usr/lib/python3.6/selectors.py", line 70, in __getitem__
    fd = self._selector._fileobj_lookup(fileobj)
  File "/usr/lib/python3.6/selectors.py", line 224, in _fileobj_lookup
    return _fileobj_to_fd(fileobj)
  File "/usr/lib/python3.6/selectors.py", line 41, in _fileobj_to_fd
    raise ValueError("Invalid file descriptor: {}".format(fd))
ValueError: Invalid file descriptor: -1

Tested on both python 3.6.6 and 3.7.0

SomberNight commented 6 years ago

I've pinpointed the cause. The CancelledError is raised here: https://github.com/kyuupichan/aiorpcX/blob/d4c4cc39876b755c6ff39638604380bcb5b4a213/aiorpcx/socks.py#L301 which will result in the explicit close of the socket at https://github.com/kyuupichan/aiorpcX/blob/d4c4cc39876b755c6ff39638604380bcb5b4a213/aiorpcx/socks.py#L322-L323 ^ If I comment out sock.close(), the event loop won't die, and it seems to "work".

kyuupichan commented 6 years ago

Thanks. That's obviously not the answer though....

kyuupichan commented 6 years ago

I'm inclined to see this as a bug in asyncio no?

kyuupichan commented 6 years ago

My intial impression is it seems to be to be another problem caused by use of futures and callbacks (because they occur delayed) rather than pure awaiting in the internals of asyncio. In particular, between the cancellation and asyncio's handling of its callback effects, aiorpcX gets control and closes the socket, which is entirely reasonable. asyncio then continues to handle the cancellation in a callback, but assumes the socket is still open.

I'll try and create a simple case and, if still a problem, submit a bug report in Python.

kyuupichan commented 6 years ago

I'm struggling to reproduce more simply. Why the thread in your code? Does it happen without it?

kyuupichan commented 6 years ago

I tried this but it's fine:

import asyncio
import socket

class MyProtocol(asyncio.Protocol):

    def connection_made(self, transport):
        transport.write(b'123')   # just in case a write is needed

port = 6666

async def connect_and_recv(loop, sock):
    try:
        await loop.sock_connect(sock, ('127.0.0.1', port))
        while True:
            await loop.sock_recv(sock, 20)
    except asyncio.CancelledError:
        print("Cancelled")
        sock.close()

async def main(loop):
    server = await loop.create_server(MyProtocol, '127.0.0.1', port)
    sock = socket.socket()
    sock.setblocking(False)
    task = loop.create_task(connect_and_recv(loop, sock))
    await asyncio.sleep(0.1)
    task.cancel()
    await asyncio.sleep(0.1)

loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))
kyuupichan commented 6 years ago

https://github.com/python/cpython/blob/master/Lib/asyncio/selector_events.py#L378-L380 needs to happen to trigger it. Not sure what might cause that or how to cause it in the testcase. I suspect if the testcase can trigger those lines, then it would reproduce

SomberNight commented 6 years ago

Why the thread in your code? Does it happen without it?

Yes, actually it's redundant.

Your example reproduces it for me on Windows (10.0.16299.665). It enters the block you suspected.

I've added a traceback print there. Trace on py3.6.6 (same block is entered on 3.7.0):

>>> loop.run_until_complete(main(loop))
Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\asyncio\selector_events.py", line 381, in _sock_recv
    data = sock.recv(n)
BlockingIOError: [WinError 10035] A non-blocking socket operation could not be completed immediately
Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\asyncio\selector_events.py", line 381, in _sock_recv
    data = sock.recv(n)
BlockingIOError: [WinError 10035] A non-blocking socket operation could not be completed immediately
Cancelled
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 455, in run_until_complete
    self.run_forever()
  File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 422, in run_forever
    self._run_once()
  File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 1398, in _run_once
    event_list = self._selector.select(timeout)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\selectors.py", line 323, in select
    r, w, _ = self._select(self._readers, self._writers, [], timeout)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\selectors.py", line 314, in _select
    r, w, x = select.select(r, w, w, timeout)
OSError: [WinError 10038] An operation was attempted on something that is not a socket

but this example does not trigger it on Linux (just like the original one did not). I can try to come up with an example for Linux but as I said, something different happens there (only the trace is printed, loop does not die).

kyuupichan commented 6 years ago

Windows and Linux have different event loops because the underlying O/S handles "select"ing differently. Anyway, it's good to know my code does it on Windows; that's enough to file a bug report. It's clear looking at the asyncio code what the issue is, just not how to trigger it. The remove_reader is delayed and in the meantime the socket can be closed (in which case the original fd is invalid, and socket._fileno is set to -1)

kyuupichan commented 6 years ago

Meanwhile I think the best solution is your solution - to not close the socket, and let the garbage collector do it, lame though that feels.

SomberNight commented 6 years ago

This seems related: https://bugs.python.org/issue30064 which actually reproduces the OSError on Windows, and the ValueError on Linux.

kyuupichan commented 6 years ago

https://bugs.python.org/issue34795 FWIW

kyuupichan commented 6 years ago

@SomberNight Can you try with uvloop and setting EVENT_LOOP_POLICY to uvloop? In the issue I filed above they're asking; unfortunately I'm not in a state to reproduce your issue at all at present.

SomberNight commented 6 years ago

I can't reproduce the issue with uvloop on linux. (looks like uvloop does not work on windows atm)

kyuupichan commented 6 years ago

awesome, thanks!