Closed simonw closed 2 years ago
Probably this code - I think response.body()
is breaking: https://github.com/simonw/shot-scraper/blob/bf34a76cb0a1d2e34ae4e47440d14682f3942513/shot_scraper/cli.py#L728-L734
Added this debugging code:
diff --git a/shot_scraper/cli.py b/shot_scraper/cli.py
index d104725..f5636a8 100644
--- a/shot_scraper/cli.py
+++ b/shot_scraper/cli.py
@@ -726,6 +726,12 @@ def take_shot(
if log_requests:
def on_response(response):
+ try:
+ body = response.body()
+ except Exception as ex:
+ print(ex)
+ print(response.url)
+ return
log_requests.write(
json.dumps(
{
And got this:
Protocol error (Network.getResponseBody): No resource with given identifier found
https://cdn.jsdelivr.net/pyodide/v0.20.0/full/pyodide.js
Protocol error (Network.getResponseBody): No resource with given identifier found
https://cdn.jsdelivr.net/pyodide/v0.20.0/full/pyodide.asm.js
...
https://github.com/puppeteer/puppeteer/issues/2258#issuecomment-380647459 says "resources get dumped after page commits navigation" - so presumably what's happening here is that a page navigation has occurred which clears those resources from memory before my Python code gets a chance to call .body()
on them.
My hunch is that it's a lot harder to reliably access the size of the resource than I had expected.
I'm going to try my best, but return "size": null
if the resource body size could not be calculated.
I'll mention this in the documentation.
This seems to do the right thing:
diff --git a/shot_scraper/cli.py b/shot_scraper/cli.py
index d104725..a19e878 100644
--- a/shot_scraper/cli.py
+++ b/shot_scraper/cli.py
@@ -726,12 +726,20 @@ def take_shot(
if log_requests:
def on_response(response):
+ try:
+ body = response.body()
+ size = len(body)
+ except Error as ex:
+ if "Network.getResponseBody" in ex.message:
+ size = None
+ else:
+ raise
log_requests.write(
json.dumps(
{
"method": response.request.method,
"url": response.url,
- "size": len(response.body()),
+ "size": size,
"timing": response.request.timing,
}
)
Got this error when running:
This was logged out a bunch of times, even though the command itself ran to completion.
I think this is likely caused by the new log requests feature from:
88