Closed simonw closed 1 week ago
Relevant snippet of tests (I think): https://github.com/simonw/shot-scraper/blob/a5e9707ed343e1acd35fccf8e4ebe35fe4e62425/tests/run_examples.sh#L23-L45
Here's the problem:
shot-scraper https://simonwillison.net/ \
--selector-all .day --padding 20 \
-o examples/selector-all.png
I redesigned my blog and removed the .day
bit a while ago.
I'll do this instead:
shot-scraper https://simonwillison.net/ \
--selector-all '#secondary li:nth-child(-n+5)' \
--padding 20 \
-o examples/selector-all.png
Stil getting an error - this time I can recreate with bug.yaml
:
- output: selectors-all-from-multi.png
url: https://simonwillison.net/
selectors_all:
- #secondary li:nth-child(-n+5)
- .entry:nth-of-type(1)
padding: 20
And then:
shot-scraper multi bug.yaml
Error: Timed out while waiting for element to become available.
Timeout 29963ms exceeded.
=========================== logs ===========================
taking element screenshot
waiting for element to be visible and stable
element is not visible - waiting...
============================================================
I'll do this instead:
- output: selectors-all-from-multi.png
url: https://simonwillison.net/
selectors_all:
- "#secondary li:nth-child(-n+5)"
- "#secondary li:nth-child(8)"
padding: 20
New error: https://github.com/simonw/shot-scraper/actions/runs/11063333005/job/30739290051
playwright._impl._errors.Error: Page.goto: net::ERR_CONNECTION_REFUSED at https://www.whatismybrowser.com/detect/what-is-my-user-agent/
Call log:
navigating to "https://www.whatismybrowser.com/detect/what-is-my-user-agent/", waiting until "load"
Maybe they are blocking GitHub IPs? I'm going to change that test to not depend on that site.
I tried this:
# Different browsers
echo '<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>User Agent</title>
</head>
<body>
<h1>Your User Agent:</h1>
<p id="ua"></p>
<script>
document.getElementById("ua").textContent = navigator.userAgent;
</script>
</body>
</html>' > user-agent.html
shot-scraper user-agent.html \
-o examples/useragent-default-chromium.png -h 400 -w 800
shot-scraper user-agent.html \
-o examples/useragent-firefox.png -h 400 -w 800 -b firefox
shot-scraper user-agent.html \
-o examples/useragent-webkit.png -h 400 -w 800 -b webkit
rm user-agent.html
It passed for Chrome and Firefox but failed for WebKit:
playwright._impl._api_types.Error: A server with the specified hostname could not be found.
=========================== logs ===========================
navigating to "http://user-agent.html/", waiting until "load"
============================================================
Looks like there's a bug where WebKit doesn't correctly work with files loaded from disk.
Instead I'll add myself a user-agent.html in my https://github.com/simonw/tools repo.
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 514, in wrap_api_call
raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.Error: Page.goto: net::ERR_SSL_PROTOCOL_ERROR at https://localhost:9043/
Call log:
navigating to "https://localhost:9043/", waiting until "load"
Fixed!
Got this on https://github.com/simonw/shot-scraper/actions/runs/11063153856/job/30738817452