python-trio / trio-asyncio

a re-implementation of the asyncio mainloop on top of Trio
Other
187 stars 37 forks source link

Limited queue length causes internal errors with aiobotocore #136

Open DRMacIver opened 6 months ago

DRMacIver commented 6 months ago

Working on some new code I forgot to use the workaround for #130 when creating a very large number of parallel tasks using aiobotocore.

Here's a simplified example triggering it:

import trio_asyncio
from aiobotocore.session import get_session
from botocore.config import Config as BotoConfig
from trio_asyncio import aio_as_trio
import trio

# Replace with some S3 bucket and key. I didn't have a good public one to reference, sorry.
MY_BUCKET = '...'
MY_KEY = '...'
MY_REGION = '...'

async def main():
    session = get_session()
    async with aio_as_trio(session.create_client('s3', region_name=MY_REGION, config=BotoConfig(retries={'max_attempts': 20}))) as client:
        async with trio.open_nursery() as nursery:
            for _ in range(10000):
                @nursery.start_soon
                async def download_file_from_s3():
                    response = await aio_as_trio(client.get_object(
                        Bucket=MY_BUCKET,
                        Key=MY_KEY,
                    ))

                    async with await trio.open_file(target, 'wb') as o:
                        body = response['Body']
                        async with aio_as_trio(body):
                            while True:
                                chunk = await aio_as_trio(body.read(10 ** 6))
                                if not chunk:
                                    break

if __name__ == '__main__':
    trio_asyncio.run(main)

As well as the initial exception from #130 (which is expected), this gives a bunch of other interesting internal errors:

In particular:

AssertionError:
Exception ignored in: <coroutine object Runner.init at 0x7f143b608760>
Traceback (most recent call last):
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 1909, in init
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 958, in __aexit__
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 1101, in _nested_child_finished
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 1080, in _add_exc
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_ki.py", line 181, in wrapper
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 796, in cancel
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 453, in recalculate
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 1439, in _attempt_delivery_of_any_pending_cancel
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 1421, in _attempt_abort
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_io_epoll.py", line 306, in abort
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_io_epoll.py", line 275, in _update_registrations
ValueError: I/O operation on closed epoll object
Exception ignored in: <function Nursery.__del__ at 0x7f143bfadc60>
Traceback (most recent call last):
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 1266, in __del__
AssertionError:
Exception ignored in: <coroutine object run.<locals>._run_task at 0x7f143a8c2c50>
Traceback (most recent call last):
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio_asyncio/_loop.py", line 527, in _run_task
  File "/usr/lib64/python3.11/contextlib.py", line 222, in __aexit__
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio_asyncio/_loop.py", line 453, in open_loop
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 958, in __aexit__
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 1101, in _nested_child_finished
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 1080, in _add_exc
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_ki.py", line 181, in wrapper
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 796, in cancel
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 446, in recalculate
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 825, in cancel_called
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_generated_run.py", line 79, in current_time
RuntimeError: must be called from async context
Task was destroyed but it is pending!
task: <Task cancelling name='Task-8630' coro=<AioBaseClient._make_api_call() done, defined at /home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/aiobotocore/client.py:324> wait_for=<Future cancelled> cb=[run_aio_future.<locals>.done_cb() at /home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio_asyncio/_util.py:28]>

Then once a really large number of errors like that have finished, we start seeing:

Exception in default exception handler
Traceback (most recent call last):
  File "/usr/lib64/python3.11/asyncio/base_events.py", line 1797, in call_exception_handler
    self.default_exception_handler(context)
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio_asyncio/_async.py", line 50, in default_exception_handler
    self._nursery.start_soon(propagate_asyncio_error)
  File "/home/ec2-user/.local/share/virtualenvs/my-project/lib/python3.11/site-packages/trio/_core/_run.py", line 1191, in start_soon
    GLOBAL_RUN_CONTEXT.runner.spawn_impl(async_fn, args, self, name)
    ^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'RunContext' object has no attribute 'runner'

Presumably there's some underlying root problem causing all these things to go wrong together, I'm not sure. The GLOBAL_RUN_CONTEXT.runner bit at the end is super suspicious - the only place I can find that can delete that attribute is here, but as far as I know this code isn't ever forking.

oremanj commented 5 months ago

I found this very difficult to track down, so worked around it by making the default queue length unlimited. Leaving the issue open as a pointer towards issues that arise with limited queue length, but that's no longer a likely configuration.