scrapy-plugins / scrapy-playwright

🎭 Playwright integration for Scrapy
BSD 3-Clause "New" or "Revised" License
1.04k stars 112 forks source link

Unhandled asyncio errors #221

Closed Gidgidonihah closed 1 year ago

Gidgidonihah commented 1 year ago

I'm getting a decent amount of logs about an error that happens completely outside my code. As far as I'm aware, there's nothing I can do on my side about handling this error.

Every instance I have points to the same line of scrapy_playwright/handler.py

Unfortunately I also don't have an MRE to replicate it. Based upon the only location in playwright that I've found that emits that message it seems to be a default error message. I did some recent debugging for a similar issue, and this is also what happens when a request is made to get information from the page after the page has closed. In that case, it was a race condition between a thread that was already started, and the page being closed. We were able to fix that, but it could be something similar here.

I'm not sure what additional information would be helpful, but I'm happy to provide it if I can. I haven't ever seen this error locally in basic testing. I've only seen it showing up in large volume logs. Trying to do some slicing and dicing of my logs and it happens relatively rarely. In a given evaluated time period 200 sites caused the error some 2k times, out of 11k sites that were scanned. Meaning less that 2% of sites that were scanned exhibited the error.

Connection closed:

2023-08-15 15:02:48 [asyncio] ERROR: Exception in callback AsyncIOEventEmitter._emit_run.<locals>.callback(<Task finishe...tion closed')>) at /venv/lib/python3.10/site-packages/pyee/asyncio.py:65
handle: <Handle AsyncIOEventEmitter._emit_run.<locals>.callback(<Task finishe...tion closed')>) at /venv/lib/python3.10/site-packages/pyee/asyncio.py:65>
Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/venv/lib/python3.10/site-packages/pyee/asyncio.py", line 71, in callback
    self.emit("error", exc)
  File "/venv/lib/python3.10/site-packages/pyee/base.py", line 179, in emit
    self._emit_handle_potential_error(event, args[0] if args else None)
  File "/venv/lib/python3.10/site-packages/pyee/base.py", line 139, in _emit_handle_potential_error
    raise error
  File "/venv/lib/python3.10/site-packages/scrapy_playwright/handler.py", line 606, in _log_request
    referrer = await request.header_value("referer")
  File "/venv/lib/python3.10/site-packages/playwright/async_api/_generated.py", line 381, in header_value
    return mapping.from_maybe_impl(await self._impl_obj.header_value(name=name))
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_network.py", line 234, in header_value
    return (await self._actual_headers()).get(name)
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_network.py", line 242, in _actual_headers
    headers = await self._channel.send("rawRequestHeaders")
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 61, in send
    return await self._connection.wrap_api_call(
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 482, in wrap_api_call
    return await cb()
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 97, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Connection closed

Similar errors

I'm also see different errors with the same or similar stack trace. I didn't dive into the metrics of how often these occur. If these need to be broken into different issues, I can do that. Or if there is something missing that I should be doing, even better.

Target page, context or browser has been closed

2023-08-13 02:23:30 [asyncio] ERROR: Exception in callback AsyncIOEventEmitter._emit_run.<locals>.callback(<Task finishe...been closed')>) at /venv/lib/python3.10/site-packages/pyee/asyncio.py:65
handle: <Handle AsyncIOEventEmitter._emit_run.<locals>.callback(<Task finishe...been closed')>) at /venv/lib/python3.10/site-packages/pyee/asyncio.py:65>
Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/venv/lib/python3.10/site-packages/pyee/asyncio.py", line 71, in callback
    self.emit("error", exc)
  File "/venv/lib/python3.10/site-packages/pyee/base.py", line 179, in emit
    self._emit_handle_potential_error(event, args[0] if args else None)
  File "/venv/lib/python3.10/site-packages/pyee/base.py", line 139, in _emit_handle_potential_error
    raise error
  File "/venv/lib/python3.10/site-packages/scrapy_playwright/handler.py", line 606, in _log_request
    referrer = await request.header_value("referer")
  File "/venv/lib/python3.10/site-packages/playwright/async_api/_generated.py", line 381, in header_value
    return mapping.from_maybe_impl(await self._impl_obj.header_value(name=name))
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_network.py", line 234, in header_value
    return (await self._actual_headers()).get(name)
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_network.py", line 242, in _actual_headers
    headers = await self._channel.send("rawRequestHeaders")
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 61, in send
    return await self._connection.wrap_api_call(
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 482, in wrap_api_call
    return await cb()
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 97, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Target page, context or browser has been closed

Connection closed while reading from the driver:

2023-08-09 19:20:07 [asyncio] ERROR: Exception in callback AsyncIOEventEmitter._emit_run.<locals>.callback(<Task finishe... the driver')>) at /venv/lib/python3.10/site-packages/pyee/asyncio.py:65
handle: <Handle AsyncIOEventEmitter._emit_run.<locals>.callback(<Task finishe... the driver')>) at /venv/lib/python3.10/site-packages/pyee/asyncio.py:65>
Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/venv/lib/python3.10/site-packages/pyee/asyncio.py", line 71, in callback
    self.emit("error", exc)
  File "/venv/lib/python3.10/site-packages/pyee/base.py", line 179, in emit
    self._emit_handle_potential_error(event, args[0] if args else None)
  File "/venv/lib/python3.10/site-packages/pyee/base.py", line 139, in _emit_handle_potential_error
    raise error
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/venv/lib/python3.10/site-packages/pyee/asyncio.py", line 71, in callback
    self.emit("error", exc)
  File "/venv/lib/python3.10/site-packages/pyee/base.py", line 179, in emit
    self._emit_handle_potential_error(event, args[0] if args else None)
  File "/venv/lib/python3.10/site-packages/pyee/base.py", line 139, in _emit_handle_potential_error
    raise error
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/venv/lib/python3.10/site-packages/pyee/asyncio.py", line 71, in callback
    self.emit("error", exc)
  File "/venv/lib/python3.10/site-packages/pyee/base.py", line 179, in emit
    self._emit_handle_potential_error(event, args[0] if args else None)
  File "/venv/lib/python3.10/site-packages/pyee/base.py", line 139, in _emit_handle_potential_error
    raise error
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/venv/lib/python3.10/site-packages/pyee/asyncio.py", line 71, in callback
    self.emit("error", exc)
  File "/venv/lib/python3.10/site-packages/pyee/base.py", line 179, in emit
    self._emit_handle_potential_error(event, args[0] if args else None)
  File "/venv/lib/python3.10/site-packages/pyee/base.py", line 139, in _emit_handle_potential_error
    raise error
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/venv/lib/python3.10/site-packages/pyee/asyncio.py", line 71, in callback
    self.emit("error", exc)
  File "/venv/lib/python3.10/site-packages/pyee/base.py", line 179, in emit
    self._emit_handle_potential_error(event, args[0] if args else None)
  File "/venv/lib/python3.10/site-packages/pyee/base.py", line 139, in _emit_handle_potential_error
    raise error
  File "/venv/lib/python3.10/site-packages/scrapy_playwright/handler.py", line 606, in _log_request
    referrer = await request.header_value("referer")
  File "/venv/lib/python3.10/site-packages/playwright/async_api/_generated.py", line 381, in header_value
    return mapping.from_maybe_impl(await self._impl_obj.header_value(name=name))
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_network.py", line 234, in header_value
    return (await self._actual_headers()).get(name)
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_network.py", line 242, in _actual_headers
    headers = await self._channel.send("rawRequestHeaders")
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 61, in send
    return await self._connection.wrap_api_call(
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 482, in wrap_api_call
    return await cb()
  File "/venv/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 97, in inner_send
    result = next(iter(done)).result()
Exception: Connection closed while reading from the driver

Note that this is with v0.0.26, but I didn't see any changes through the current v0.0.29 that would obviously affect it.

elacuesta commented 1 year ago

Seems related to this comment. In any case, I'd say the referrer is non-essential in these cases, it should not fail that loudly.