Open simonw opened 4 months ago
Might be better to provide a class, so you can instantiate once (loading up the headless browser) and then use it for multiple things.
Or... do that, but still have a shot_scraper.javascript(...)
shortcut for quick one-off tasks.
Initial rough API design:
shot_scraper.javascript(url, javascript_code) -> a JSON decoded result
With keyword arguments for most of these:
Options:
-i, --input FILENAME Read input JavaScript from this file
-a, --auth FILENAME Path to JSON authentication context file
-o, --output FILENAME Save output JSON to this file
-r, --raw Output JSON strings as raw text
-b, --browser [chromium|firefox|webkit|chrome|chrome-beta]
Which browser to use
--browser-arg TEXT Additional arguments to pass to the browser
--user-agent TEXT User-Agent header to use
--reduced-motion Emulate 'prefers-reduced-motion' media
feature
--log-console Write console.log() to stderr
--fail Fail with an error code if a page returns an
HTTP error
--skip Skip pages that return HTTP errors
--bypass-csp Bypass Content-Security-Policy
--auth-password TEXT Password for HTTP Basic authentication
--auth-username TEXT Username for HTTP Basic authentication
image_bytes = shot_scraper.shot(url)
With a TON of options, see https://shot-scraper.datasette.io/en/stable/screenshots.html#shot-scraper-shot-help
... etc
This is going to end up being a pretty big refactor, because I'll want the CLI tool to use the new Python API under the hood.
Prototyped this with Claude 3 Opus: https://gist.github.com/simonw/a43ee47f528c0d3dc894bb4ba38aa94a
Another use-case where I'd love to be able to call shot-scraper directly from Python.
In this NICAR workshop: https://github.com/dwillis/shot-scraper-nicar24
This code: https://github.com/dwillis/shot-scraper-nicar24/blob/main/demo.py
It shouldn't be necessary to have to use
subprocess
to do something this straight-forward inshot-scraper
. I'd like to support something like this instead: