scrapy-plugins / scrapy-playwright

🎭 Playwright integration for Scrapy
BSD 3-Clause "New" or "Revised" License
1k stars 110 forks source link

Are chrome and msedge supported? #313

Closed bboyadao closed 2 months ago

bboyadao commented 2 months ago

Hi i have some troubles with other kind of browser. And as the title how can i achieve it. Thanks

ERROR logs.

2024-08-22 13:34:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-08-22 13:34:32 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-08-22 13:34:32 [scrapy-playwright] DEBUG: Total Playwright process memory: 0 Bytes (0 MiB)
2024-08-22 13:34:32 [scrapy-playwright] DEBUG: Total Playwright process memory: 0 Bytes (0 MiB)
2024-08-22 13:34:32 [scrapy-playwright] INFO: Starting download handler
2024-08-22 13:34:32 [scrapy-playwright] INFO: Starting download handler
2024-08-22 13:34:33 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method ScrapyPlaywrightDownloadHandler._engine_started of <scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler object at 0x7e8d74427d40>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/twisted/internet/defer.py", line 1251, in adapt
    extracted: _SelfResultT | Failure = result.result()
                                        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/scrapy_playwright/handler.py", line 176, in _launch
    self.browser_type: BrowserType = getattr(self.playwright, self.config.browser_type_name)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Playwright' object has no attribute 'msedge'
2024-08-22 13:34:33 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method ScrapyPlaywrightDownloadHandler._engine_started of <scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler object at 0x7e8d74427fb0>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/twisted/internet/defer.py", line 1251, in adapt
    extracted: _SelfResultT | Failure = result.result()
                                        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/scrapy_playwright/handler.py", line 176, in _launch
    self.browser_type: BrowserType = getattr(self.playwright, self.config.browser_type_name)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Playwright' object has no attribute 'msedge'
^C2024-08-22 13:34:34 [scrapy.crawler] INFO: Received SIGINT, shutting down gracefully. Send again to force

My Settings.


class Spider(BaseSpider):
    allowed_domains = ["*"]
    user_type = random.choice([
        "firefox",
        "chromium",
        "chrome",
        "msedge"
    ])
    custom_settings = {
        "PLAYWRIGHT_LAUNCH_OPTIONS": {
            "headless": True,
            "timeout": 20 * 1000,  # 20 seconds
        },
        "PLAYWRIGHT_BROWSER_TYPE": user_type,
        "EXTENSIONS": {
            "scrapy.extensions.memusage.MemoryUsage": None,
            "scrapy_playwright.memusage.ScrapyPlaywrightMemoryUsageExtension": 0,
        },
        "CONCURRENT_REQUESTS": os.cpu_count()*3,
}
elacuesta commented 2 months ago

The only supported values for PLAYWRIGHT_BROWSER_TYPE are chromium, firefox & webkit (the only ones available as properties at https://playwright.dev/python/docs/api/class-playwright). In order to use msedge or chrome you need to specify a value for channel in PLAYWRIGHT_LAUNCH_OPTIONS so it gets passed to BrowserType.launch.