simonw / shot-scraper

A command-line utility for taking automated screenshots of websites
https://shot-scraper.datasette.io
Apache License 2.0
1.67k stars 73 forks source link

Test failure: Timed out while waiting for element to become available. #158

Closed simonw closed 1 week ago

simonw commented 1 week ago

Got this on https://github.com/simonw/shot-scraper/actions/runs/11063153856/job/30738817452

Screenshot of '#bighead, .overband' on 'https://simonwillison.net/' written to 'examples/bighead-multi-selector.png'
Error: Timed out while waiting for element to become available.

Locator.screenshot: Timeout 29952ms exceeded.
Call log:
taking element screenshot
  - waiting for fonts to load...
  - fonts loaded
  - attempting scroll into view action
  -   waiting for element to be stable
  -   element is not visible
  ...
simonw commented 1 week ago

Relevant snippet of tests (I think): https://github.com/simonw/shot-scraper/blob/a5e9707ed343e1acd35fccf8e4ebe35fe4e62425/tests/run_examples.sh#L23-L45

simonw commented 1 week ago

Here's the problem:

shot-scraper https://simonwillison.net/ \
   --selector-all .day --padding 20 \
   -o examples/selector-all.png

I redesigned my blog and removed the .day bit a while ago.

simonw commented 1 week ago

I'll do this instead:

shot-scraper https://simonwillison.net/ \
   --selector-all '#secondary li:nth-child(-n+5)' \
   --padding 20 \
   -o examples/selector-all.png

selector-all

simonw commented 1 week ago

Stil getting an error - this time I can recreate with bug.yaml:

- output: selectors-all-from-multi.png
  url: https://simonwillison.net/
  selectors_all:
  - #secondary li:nth-child(-n+5)
  - .entry:nth-of-type(1)
  padding: 20

And then:

shot-scraper multi bug.yaml
Error: Timed out while waiting for element to become available.

Timeout 29963ms exceeded.
=========================== logs ===========================
taking element screenshot
  waiting for element to be visible and stable
    element is not visible - waiting...
============================================================
simonw commented 1 week ago

I'll do this instead:

- output: selectors-all-from-multi.png
  url: https://simonwillison.net/
  selectors_all:
  - "#secondary li:nth-child(-n+5)"
  - "#secondary li:nth-child(8)"
  padding: 20
simonw commented 1 week ago

New error: https://github.com/simonw/shot-scraper/actions/runs/11063333005/job/30739290051

playwright._impl._errors.Error: Page.goto: net::ERR_CONNECTION_REFUSED at https://www.whatismybrowser.com/detect/what-is-my-user-agent/
Call log:
navigating to "https://www.whatismybrowser.com/detect/what-is-my-user-agent/", waiting until "load"

Maybe they are blocking GitHub IPs? I'm going to change that test to not depend on that site.

simonw commented 1 week ago

I tried this:

# Different browsers
echo '<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>User Agent</title>
</head>
<body>
    <h1>Your User Agent:</h1>
    <p id="ua"></p>
    <script>
        document.getElementById("ua").textContent = navigator.userAgent;
    </script>
</body>
</html>' > user-agent.html
shot-scraper user-agent.html \
  -o examples/useragent-default-chromium.png -h 400 -w 800
shot-scraper user-agent.html \
  -o examples/useragent-firefox.png -h 400 -w 800 -b firefox
shot-scraper user-agent.html \
  -o examples/useragent-webkit.png -h 400 -w 800 -b webkit
rm user-agent.html

It passed for Chrome and Firefox but failed for WebKit:

playwright._impl._api_types.Error: A server with the specified hostname could not be found.
=========================== logs ===========================
navigating to "http://user-agent.html/", waiting until "load"
============================================================

Looks like there's a bug where WebKit doesn't correctly work with files loaded from disk.

simonw commented 1 week ago

Instead I'll add myself a user-agent.html in my https://github.com/simonw/tools repo.

simonw commented 1 week ago
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 514, in wrap_api_call
    raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.Error: Page.goto: net::ERR_SSL_PROTOCOL_ERROR at https://localhost:9043/
Call log:
navigating to "https://localhost:9043/", waiting until "load"
simonw commented 1 week ago

Fixed!