scrapy-plugins / scrapy-playwright

🎭 Playwright integration for Scrapy
BSD 3-Clause "New" or "Revised" License
979 stars 106 forks source link

Question about reducing browser restart frequency with scrapy-playwright #311

Open SH-zwhy opened 1 month ago

SH-zwhy commented 1 month ago

Hi,

I'm using scrapy-playwright for data scraping, where URLs are provided through a txt file. I've noticed that every time a URL is scraped, the browser restarts, which significantly reduces scraping efficiency.

Is there a way to avoid restarting the browser for each URL or to reduce the frequency of browser restarts to improve scraping performance?

Thanks!

elacuesta commented 1 month ago

That's not the way the package works by default, you might be starting a new job for each URL. By default a new page, not browser, is created for each URL, however you can reuse pages as explained at https://github.com/scrapy-plugins/scrapy-playwright?tab=readme-ov-file#playwright_page.

Please share your code and logs as requested in the Reporting issues section.