scrapy-plugins / scrapy-playwright

🎭 Playwright integration for Scrapy
BSD 3-Clause "New" or "Revised" License
992 stars 108 forks source link

Improve concurrency on Windows #286

Closed elacuesta closed 3 months ago

elacuesta commented 3 months ago

Closes #282

Alternative approach to #285

This yields full concurrency ('playwright/page_count/max_concurrent': 32 in stats) with the sample spider from #285:

import scrapy

class DelayTest(scrapy.Spider):
    name = "delay"
    custom_settings = {
        "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
        "DOWNLOAD_HANDLERS": {
            # "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
            "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
        },
        "PLAYWRIGHT_MAX_PAGES_PER_CONTEXT": 32,
        "CONCURRENT_REQUESTS_PER_IP": 32,
        "CONCURRENT_REQUESTS_PER_DOMAIN": 32,
        "CONCURRENT_REQUESTS": 32,
        "PLAYWRIGHT_LAUNCH_OPTIONS": {"headless": False},
    }

    def start_requests(self):
        for i in range(32):
            yield scrapy.Request(
                url=f"https://httpbin.org/delay/1?i={i}",
                meta={"playwright": True},
            )

    def parse(self, response):
        print(response.url)
codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (55fd416) to head (9942e6b). Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #286 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 6 6 Lines 517 543 +26 ========================================= + Hits 517 543 +26 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.