Closed dream2333 closed 1 day ago
Platform: linux
OS: posix
Python: 3.12.3
========================
scrapy_playwright : 0.0.36
playwright : 1.44.0
========================
Scrapy : 2.11.2
lxml : 5.2.2.0
libxml2 : 2.12.6
cssselect : 1.2.0
parsel : 1.9.1
w3lib : 2.2.1
Twisted : 24.3.0
Python : 3.12.3 (main, May 14 2024, 07:44:45) [GCC 10.2.1 20210110]
pyOpenSSL : 24.1.0 (OpenSSL 3.2.2 4 Jun 2024)
cryptography : 42.0.8
Platform : Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
I see, thanks for the update. This is an issue with the recent Windows implementation, I'll look into it.
My code can only proceed with the next request after the previous one has finished, acting as if the requests are blocking.
When I crawl web pages without using Playwright, the Request objects generated by
start_requests
are downloaded in parallel by the downloader.However, when I use Playwright for downloading, the requests are not downloaded in parallel but are downloaded in a blocking manner. A new browser page is only opened after the previous page has finished loading both in windows and wsl. This is more evident when I switch to slower websites. How can I make the Request objects from
start_requests
download in parallel?