PLAYWRIGHT_RESTART_DISCONNECTED_BROWSER not working on local browser

The handler is not allowing enough time for the new browser to launch after a crash.

Sample spider adapted from #167.

# crash.py
import os
from signal import SIGKILL

import psutil
import scrapy

class CrashSpider(scrapy.Spider):
    name = "crash"
    custom_settings = {
        "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
        "DOWNLOAD_HANDLERS": {
            "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
        },
    }

    def start_requests(self):
        yield scrapy.Request("https://httpbin.org/get", meta={"playwright": True})

    def parse(self, response):
        print("request:{}".format(response.request.url))
        for proc in psutil.process_iter(["pid", "name"]):
            if proc.info["name"] == "chrome":
                os.kill(proc.info["pid"], SIGKILL)
        yield scrapy.Request("https://httpbin.org/headers", meta={"playwright": True})

$ scrapy runspider crash.py

(...)
2024-07-16 14:55:09 [scrapy.core.engine] INFO: Spider opened
2024-07-16 14:55:09 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-07-16 14:55:09 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-07-16 14:55:09 [scrapy-playwright] INFO: Starting download handler
2024-07-16 14:55:14 [scrapy-playwright] INFO: Launching browser chromium
2024-07-16 14:55:14 [scrapy-playwright] INFO: Browser chromium launched
2024-07-16 14:55:14 [scrapy-playwright] DEBUG: Browser context started: 'default' (persistent=False, remote=False)
2024-07-16 14:55:14 [scrapy-playwright] DEBUG: [Context=default] New page created, page count is 1 (1 for all contexts)
2024-07-16 14:55:14 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://httpbin.org/get> (resource type: document)
2024-07-16 14:55:14 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://httpbin.org/get>
2024-07-16 14:55:14 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://httpbin.org/get> (referer: None) ['playwright']
Response: <200 https://httpbin.org/get>
2024-07-16 14:55:14 [scrapy-playwright] DEBUG: Browser context closed: 'default' (persistent=False, remote=False)
2024-07-16 14:55:14 [scrapy-playwright] DEBUG: Browser disconnected
2024-07-16 14:55:15 [scrapy.core.scraper] ERROR: Error downloading <GET https://httpbin.org/headers>
Traceback (most recent call last):
  File "/.../venv-scrapy-playwright/lib/python3.10/site-packages/twisted/internet/defer.py", line 1996, in _inlineCallbacks
    result = context.run(
  File "/.../venv-scrapy-playwright/lib/python3.10/site-packages/twisted/python/failure.py", line 519, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/.../venv-scrapy-playwright/lib/python3.10/site-packages/scrapy/core/downloader/middleware.py", line 54, in process_request
    return (yield download_func(request=request, spider=spider))
  File "/.../venv-scrapy-playwright/lib/python3.10/site-packages/twisted/internet/defer.py", line 1248, in adapt
    extracted: _SelfResultT | Failure = result.result()
  File "/.../scrapy_playwright/handler.py", line 358, in _download_request
    page = await self._create_page(request=request, spider=spider)
  File "/.../scrapy_playwright/handler.py", line 286, in _create_page
    page = await ctx_wrapper.context.new_page()
  File "/.../venv-scrapy-playwright/lib/python3.10/site-packages/playwright/async_api/_generated.py", line 12379, in new_page
    return mapping.from_impl(await self._impl_obj.new_page())
  File "/.../venv-scrapy-playwright/lib/python3.10/site-packages/playwright/_impl/_browser_context.py", line 294, in new_page
    return from_channel(await self._channel.send("newPage"))
  File "/.../venv-scrapy-playwright/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 59, in send
    return await self._connection.wrap_api_call(
  File "/.../venv-scrapy-playwright/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 514, in wrap_api_call
    raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.TargetClosedError: BrowserContext.new_page: Target page, context or browser has been closed
Browser logs:

<launching> /home/eugenio/.cache/ms-playwright/chromium-1117/chrome-linux/chrome --disable-field-trial-config --disable-background-networking --enable-features=NetworkService,NetworkServiceInProcess --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-back-forward-cache --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-component-update --no-default-browser-check --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=ImprovedCookieControls,LazyFrameLoading,GlobalMediaControls,DestroyProfileOnBrowserClose,MediaRouter,DialMediaRouteProvider,AcceptCHFrame,AutoExpandDetailsElement,CertificateTransparencyComponentUpdater,AvoidUnnecessaryBeforeUnloadCheckSync,Translate,HttpsUpgrades,PaintHolding --allow-pre-commit-input --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --force-color-profile=srgb --metrics-recording-only --no-first-run --enable-automation --password-store=basic --use-mock-keychain --no-service-autorun --export-tagged-pdf --disable-search-engine-choice-screen --headless --hide-scrollbars --mute-audio --blink-settings=primaryHoverType=2,availableHoverTypes=2,primaryPointerType=4,availablePointerTypes=4 --no-sandbox --user-data-dir=/tmp/playwright_chromiumdev_profile-XXXXXXTy2tU6 --remote-debugging-pipe --no-startup-window
<launched> pid=59155
[pid=59155][err] [0716/145514.301003:INFO:config_dir_policy_loader.cc(118)] Skipping mandatory platform policies because no policy file was found at: /etc/chromium/policies/managed
[pid=59155][err] [0716/145514.301041:INFO:config_dir_policy_loader.cc(118)] Skipping recommended platform policies because no policy file was found at: /etc/chromium/policies/recommended
[pid=59155][err] [0716/145514.308584:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable.
[pid=59155][err] [0716/145514.343012:WARNING:sandbox_linux.cc(436)] InitializeSandbox() called with multiple threads in process gpu-process.
2024-07-16 14:55:15 [scrapy.core.engine] INFO: Closing spider (finished)
(...)

$ scrapy version -v
Scrapy       : 2.11.1
lxml         : 5.1.0.0
libxml2      : 2.12.3
cssselect    : 1.2.0
parsel       : 1.8.1
w3lib        : 2.1.2
Twisted      : 23.10.0
Python       : 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]
pyOpenSSL    : 24.0.0 (OpenSSL 3.2.1 30 Jan 2024)
cryptography : 42.0.5
Platform     : Linux-6.5.0-41-generic-x86_64-with-glibc2.35

$ python -c "import scrapy_playwright; print(scrapy_playwright.__version__)"
0.0.39

I don't think this can be handled with locking or other synchronization primitives, as the browser crash could happen at any time. Retrying seems like the most sensible way.

scrapy-plugins / scrapy-playwright

PLAYWRIGHT_RESTART_DISCONNECTED_BROWSER not working on local browser #304