Closed Harvi-C closed 1 year ago
I didn't turn on "playwright_include_page=True",
So do I need to explicitly call page.close() or page.context.close(), how do I get the page ?
This problem forced me to keep an eye on the machine and manually restart my crawler every once in a while, starting with the page number from the previous collection. It would be a great help if I could get your reply, thanks !
I noticed the problem: https://github.com/microsoft/playwright/issues/6319
Should this be closed then?
@Harvi-C do you have a solution to this problem? I am also encountering boundless memory in scrapy and playwright, using a single context for all pages and not turning on playwright_include_page=True
yield scrapy.Request(url=ALL_IMAGE_URL + str(page), callback=self.parse, meta=dict( playwright=True, playwright_page_methods=[ PageMethod("evaluate", "window.scrollBy(0, 500)"), PageMethod("wait_for_timeout", timeout), ]
I use the simplest boot method in scrapy, the memory footprint of the machine gets higher and higher, after I crawl about 700 pages, the memory footprint of the machine increased from 4-5GB at the beginning to 18GB, I don't know why
I didn't turn on "playwright_include_page=True", and i set "PLAYWRIGHT_MAX_PAGES_PER_CONTEXT = 16"
So is this a memory leak ? and what do I do about it