sonic182 / aiosonic

A very fast Python asyncio http client
https://aiosonic.readthedocs.io/en/latest/
MIT License
154 stars 19 forks source link

RuntimeError: readuntil() called while another coroutine is already waiting for incoming data #473

Closed geraldog closed 3 months ago

geraldog commented 3 months ago

Describe the bug Persistent occasions where my crawler gets: RuntimeError: readuntil() called while another coroutine is already waiting for incoming data

The stack trace is irrelevant to trace the bug here. It comes from https://github.com/python/cpython/blob/d9efa45d7457b0dfea467bb1c2d22c69056ffc73/Lib/asyncio/streams.py#L525 but that itself explains little.

After days of coding and tracing with print() I found out that even cancelling the waiter so we don't raise the RuntimeError on streams.py is pointless. And that the real reason for the bug is connection.close() is being called twice from dfferent code-paths, a concurrency mess.

To Reproduce Steps to reproduce the behavior:

  1. Go to your crawler you have written on top of aiosonic
  2. Start the crawler with at least a decent concurrency of say, 300 "clients" on the pool
  3. Remember to catch your exceptions
  4. See error

Expected behavior Not raising RuntimeError by calling readuntil() or read() - any of the stream reading awaitables that consume from the StreamReader buffer of bytes object - twice on top of each other.

Screenshots None

Desktop (please complete the following information): Not applicable

Smartphone (please complete the following information): Not applicable

Additional context Hi @sonic182 and sorry for the delay in filing the Issue. I wanted to have a fix before discussing any of this. I have a draft of a fix. Will file the PR within today. Thanks for Everything!

geraldog commented 3 months ago

Draft fix is at #474

sonic182 commented 3 months ago

Fixed in 0.19.0

geraldog commented 3 months ago

Hi @sonic182

I'm test-crawling the top 1 million Cloudflare Radar domains.

In the end we alleviated the problem a lot but it seems after a million domains I end up with around 20 RuntimeError's. Not much, maybe one every 50,000 domains or so but still worth fixing on #483