Closed partyspy closed 4 months ago
Sorry, I cannot reproduce.
$ scrapy version -v
Scrapy : 2.11.2
lxml : 4.9.3.0
libxml2 : 2.10.3
cssselect : 1.2.0
parsel : 1.8.1
w3lib : 2.1.2
Twisted : 24.3.0
Python : 3.10.10 (main, Feb 16 2023, 02:58:25) [Clang 14.0.0 (clang-1400.0.29.202)]
pyOpenSSL : 23.2.0 (OpenSSL 3.1.2 1 Aug 2023)
cryptography : 41.0.3
Platform : macOS-14.4.1-x86_64-i386-64bit
$ python -c "import scrapy_playwright; print(scrapy_playwright.__version__)"
0.0.35
import scrapy
def should_abort_request(request):
return request.resource_type == "image" or ".jpg" in request.url
class ExampleSpider(scrapy.Spider):
name = "example"
custom_settings = {
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
"DOWNLOAD_HANDLERS": {
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
},
"PLAYWRIGHT_BROWSER_TYPE": "webkit",
"PLAYWRIGHT_ABORT_REQUEST": should_abort_request,
}
def start_requests(self):
yield scrapy.Request(
url="https://books.toscrape.com",
meta={
"playwright": True,
"playwright_page_goto_kwargs": {"wait_until": "networkidle"},
},
)
def parse(self, response):
yield {"url": response.url}
(...)
2024-06-03 22:17:29 [scrapy.core.engine] INFO: Spider opened
2024-06-03 22:17:29 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-06-03 22:17:29 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-06-03 22:17:29 [scrapy-playwright] INFO: Starting download handler
2024-06-03 22:17:34 [scrapy-playwright] INFO: Launching browser webkit
2024-06-03 22:17:34 [scrapy-playwright] INFO: Browser webkit launched
2024-06-03 22:17:35 [scrapy-playwright] DEBUG: Browser context started: 'default' (persistent=False, remote=False)
2024-06-03 22:17:35 [scrapy-playwright] DEBUG: [Context=default] New page created, page count is 1 (1 for all contexts)
2024-06-03 22:17:35 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/> (resource type: document)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/css/styles.css> (resource type: stylesheet, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.css> (resource type: stylesheet, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/css/datetimepicker.css> (resource type: stylesheet, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/26/0c/260c6ae16bce31c8f8c95daddd9f4a1c.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/3e/ef/3eef99c9d9adef34639f510662022830.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/26/0c/260c6ae16bce31c8f8c95daddd9f4a1c.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/32/51/3251cf3a3412f53f339e42cac2134093.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/3e/ef/3eef99c9d9adef34639f510662022830.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/be/a5/bea5697f2534a2f86a3ef27b5a8c12a6.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/32/51/3251cf3a3412f53f339e42cac2134093.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/68/33/68339b4c9bc034267e1da611ab3b34f8.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/be/a5/bea5697f2534a2f86a3ef27b5a8c12a6.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/92/27/92274a95b7c251fea59a2b8a78275ab4.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/68/33/68339b4c9bc034267e1da611ab3b34f8.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/3d/54/3d54940e57e662c4dd1f3ff00c78cc64.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/92/27/92274a95b7c251fea59a2b8a78275ab4.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/66/88/66883b91f6804b2323c8369331cb7dd1.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/3d/54/3d54940e57e662c4dd1f3ff00c78cc64.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/66/88/66883b91f6804b2323c8369331cb7dd1.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/58/46/5846057e28022268153beff6d352b06c.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/be/f4/bef44da28c98f905a3ebec0b87be8530.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/10/48/1048f63d3b5061cd2f424d20b3f9b666.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/5b/88/5b88c52633f53cacf162c15f4f823153.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/94/b1/94b1b8b244bce9677c2f29ccc890d4d2.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/81/c4/81c4a973364e17d01f217e1188253d5e.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/58/46/5846057e28022268153beff6d352b06c.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/be/f4/bef44da28c98f905a3ebec0b87be8530.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/10/48/1048f63d3b5061cd2f424d20b3f9b666.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/5b/88/5b88c52633f53cacf162c15f4f823153.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/54/60/54607fe8945897cdcced0044103b10b6.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/94/b1/94b1b8b244bce9677c2f29ccc890d4d2.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/55/33/553310a7162dfbc2c6d19a84da0df9e1.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/09/a3/09a3aef48557576e1a85ba7efea8ecb7.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/81/c4/81c4a973364e17d01f217e1188253d5e.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/54/60/54607fe8945897cdcced0044103b10b6.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/0b/bc/0bbcd0a6f4bcd81ccb1049a52736406e.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/55/33/553310a7162dfbc2c6d19a84da0df9e1.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/27/a5/27a53d0bb95bdd88288eaf66c9230d7e.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/js/bootstrap3/bootstrap.min.js> (resource type: script, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/js/oscar/ui.js> (resource type: script, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.js> (resource type: script, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/locales/bootstrap-datetimepicker.all.js> (resource type: script, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/css/styles.css>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/css/datetimepicker.css>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.css>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/js/jquery/jquery-1.9.1.min.js> (resource type: script, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/09/a3/09a3aef48557576e1a85ba7efea8ecb7.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/0b/bc/0bbcd0a6f4bcd81ccb1049a52736406e.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/fonts/fontawesome-webfont.woff%3Fv=3.2.1> (resource type: font, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/27/a5/27a53d0bb95bdd88288eaf66c9230d7e.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/js/bootstrap3/bootstrap.min.js>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.js>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/js/oscar/ui.js>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/locales/bootstrap-datetimepicker.all.js>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/fonts/fontawesome-webfont.woff%3Fv=3.2.1>
2024-06-03 22:17:37 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/js/jquery/jquery-1.9.1.min.js>
2024-06-03 22:17:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com> (referer: None) ['playwright']
2024-06-03 22:17:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://books.toscrape.com/>
{'url': 'https://books.toscrape.com/'}
2024-06-03 22:17:37 [scrapy.core.engine] INFO: Closing spider (finished)
2024-06-03 22:17:37 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 219,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 51287,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 8.153309,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2024, 6, 4, 1, 17, 37, 725425, tzinfo=datetime.timezone.utc),
'item_scraped_count': 1,
'log_count/DEBUG': 67,
'log_count/INFO': 13,
'log_count/WARNING': 1,
'memusage/max': 57114624,
'memusage/startup': 57110528,
'playwright/context_count': 1,
'playwright/context_count/max_concurrent': 1,
'playwright/context_count/persistent/False': 1,
'playwright/context_count/remote/False': 1,
'playwright/page_count': 1,
'playwright/page_count/closed': 1,
'playwright/page_count/max_concurrent': 1,
'playwright/request_count': 30,
'playwright/request_count/aborted': 20,
'playwright/request_count/method/GET': 30,
'playwright/request_count/navigation': 1,
'playwright/request_count/resource_type/document': 1,
'playwright/request_count/resource_type/font': 1,
'playwright/request_count/resource_type/image': 20,
'playwright/request_count/resource_type/script': 5,
'playwright/request_count/resource_type/stylesheet': 3,
'playwright/response_count': 10,
'playwright/response_count/method/GET': 10,
'playwright/response_count/resource_type/document': 1,
'playwright/response_count/resource_type/font': 1,
'playwright/response_count/resource_type/script': 5,
'playwright/response_count/resource_type/stylesheet': 3,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2024, 6, 4, 1, 17, 29, 572116, tzinfo=datetime.timezone.utc)}
2024-06-03 22:17:37 [scrapy.core.engine] INFO: Spider closed (finished)
2024-06-03 22:17:37 [scrapy-playwright] INFO: Closing download handler
2024-06-03 22:17:37 [scrapy-playwright] DEBUG: Browser context closed: 'default' (persistent=False, remote=False)
2024-06-03 22:17:37 [scrapy-playwright] INFO: Closing browser
Notice the "Aborted Playwright request" log lines and the 'playwright/request_count/aborted': 20,
entry in the job stats.
Environment
When PLAYWRIGHT_BROWSER_TYPE set as 'chromium' (or default) under macOS, , there appears to be a memory leak as number of crawled pages increased. Meanwhile no memory leak is found under Linux.
When PLAYWRIGHT_BROWSER_TYPE set as 'webkit' under macOS, the memory leak issue is gone but the PLAYWRIGHT_ABORT_REQUEST callback fails to intercept the most parts of requests.