simonw / shot-scraper

A command-line utility for taking automated screenshots of websites
https://shot-scraper.datasette.io
Apache License 2.0
1.67k stars 73 forks source link

Support for HTTP Basic Authentication #140

Closed simonw closed 8 months ago

simonw commented 8 months ago

Idea suggested here: https://mastodon.social/@jpmens/111879657240670040

I thought this would work with shot-scraper auth but...

shot-scraper auth https://datasette-auth-passwords-http-basic-demo.datasette.io/ auth.json

Produces:

Traceback (most recent call last):
  File "/Users/simon/.local/bin/shot-scraper", line 8, in <module>
    sys.exit(cli())
  File "/Users/simon/.local/pipx/venvs/shot-scraper/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/simon/.local/pipx/venvs/shot-scraper/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/simon/.local/pipx/venvs/shot-scraper/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/simon/.local/pipx/venvs/shot-scraper/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/simon/.local/pipx/venvs/shot-scraper/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/simon/.local/pipx/venvs/shot-scraper/lib/python3.10/site-packages/shot_scraper/cli.py", line 913, in auth
    page.goto(url)
  File "/Users/simon/.local/pipx/venvs/shot-scraper/lib/python3.10/site-packages/playwright/sync_api/_generated.py", line 9303, in goto
    self._sync(
  File "/Users/simon/.local/pipx/venvs/shot-scraper/lib/python3.10/site-packages/playwright/_impl/_sync_base.py", line 109, in _sync
    return task.result()
  File "/Users/simon/.local/pipx/venvs/shot-scraper/lib/python3.10/site-packages/playwright/_impl/_page.py", line 473, in goto
    return await self._main_frame.goto(**locals_to_params(locals()))
  File "/Users/simon/.local/pipx/venvs/shot-scraper/lib/python3.10/site-packages/playwright/_impl/_frame.py", line 138, in goto
    await self._channel.send("goto", locals_to_params(locals()))
  File "/Users/simon/.local/pipx/venvs/shot-scraper/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 61, in send
    return await self._connection.wrap_api_call(
  File "/Users/simon/.local/pipx/venvs/shot-scraper/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 490, in wrap_api_call
    return await cb()
  File "/Users/simon/.local/pipx/venvs/shot-scraper/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 99, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: net::ERR_INVALID_AUTH_CREDENTIALS at https://datasette-auth-passwords-http-basic-demo.datasette.io/
=========================== logs ===========================
navigating to "https://datasette-auth-passwords-http-basic-demo.datasette.io/", waiting until "load"
============================================================
simonw commented 8 months ago

Could solve it with: https://playwright.dev/python/docs/network#http-authentication

context = browser.new_context(
    http_credentials={"username": "bill", "password": "pa55w0rd"}
)
page = context.new_page()
page.goto("https://example.com")

Via:

simonw commented 8 months ago

I can't see a way to tell page.goto() not to throw a ERR_INVALID_AUTH_CREDENTIALS error if it encounters an authentication needed page, so I don't think I can get shot-scraper auth to work. I'll need to add new options instead.

simonw commented 8 months ago

Got this working:

diff --git a/shot_scraper/cli.py b/shot_scraper/cli.py
index faf45e5..22c91b7 100644
--- a/shot_scraper/cli.py
+++ b/shot_scraper/cli.py
@@ -65,6 +65,12 @@ def bypass_csp_option(fn):
     return fn

+def http_auth_options(fn):
+    click.option("--auth-username", help="Username for HTTP Basic authentication")(fn)
+    click.option("--auth-password", help="Password for HTTP Basic authentication")(fn)
+    return fn
+
+
 def skip_or_fail(response, skip, fail):
     if skip and fail:
         raise click.ClickException("--skip and --fail cannot be used together")
@@ -201,6 +207,7 @@ def cli():
 @skip_fail_options
 @bypass_csp_option
 @silent_option
+@http_auth_options
 def shot(
     url,
     auth,
@@ -230,6 +237,8 @@ def shot(
     fail,
     bypass_csp,
     silent,
+    auth_username,
+    auth_password,
 ):
     """
     Take a single screenshot of a page or portion of a page.
@@ -291,6 +300,8 @@ def shot(
             timeout=timeout,
             reduced_motion=reduced_motion,
             bypass_csp=bypass_csp,
+            auth_username=auth_username,
+            auth_password=auth_password,
         )
         if interactive or devtools:
             use_existing_page = True
@@ -341,6 +352,8 @@ def _browser_context(
     timeout=None,
     reduced_motion=False,
     bypass_csp=False,
+    auth_username=None,
+    auth_password=None,
 ):
     browser_kwargs = dict(headless=not interactive, devtools=devtools)
     if browser == "chromium":
@@ -363,6 +376,11 @@ def _browser_context(
         context_args["user_agent"] = user_agent
     if bypass_csp:
         context_args["bypass_csp"] = bypass_csp
+    if auth_username and auth_password:
+        context_args["http_credentials"] = {
+            "username": auth_username,
+            "password": auth_password,
+        }
     context = browser_obj.new_context(**context_args)
     if timeout:
         context.set_default_timeout(timeout)

Then:

shot-scraper https://datasette-auth-passwords-http-basic-demo.datasette.io/ \
  --auth-username root \
  --auth-password 'password!'

Which produced:

datasette-auth-passwords-http-basic-demo-datasette-io

simonw commented 8 months ago

Need to add it to:

simonw commented 8 months ago

Got this working:

shot-scraper javascript https://datasette-auth-passwords-http-basic-demo.datasette.io/ \
  --auth-username root \
  --auth-password 'password!' \
  'document.title'
simonw commented 8 months ago

And this:

echo "- url: https://datasette-auth-passwords-http-basic-demo.datasette.io/\n  output: /tmp/out.png" | \
  shot-scraper multi - --auth-username root --auth-password 'password!'
simonw commented 8 months ago

And:

shot-scraper pdf https://datasette-auth-passwords-http-basic-demo.datasette.io/ \
  --auth-username root \
  --auth-password 'password!'
simonw commented 8 months ago

And:

shot-scraper accessibility https://datasette-auth-passwords-http-basic-demo.datasette.io/ \
  --auth-username root \
  --auth-password 'password!'

Output starts:

{
    "role": "WebArea",
    "name": "datasette-auth-passwords HTTP Basic auth demo: _internal, public",
    "children": [
        {
            "role": "link",
            "name": "home"
        },
        {
            "role": "DisclosureTriangle",
            "name": "Menu"
        },
        {
            "role": "text",
            "name": "Root"
        },
        {
            "role": "heading",
            "name": "datasette-auth-passwords HTTP Basic auth demo",
            "level": 1
        },
        {
            "role": "link",
            "name": "_internal"
        }
simonw commented 8 months ago

And:

shot-scraper html https://datasette-auth-passwords-http-basic-demo.datasette.io/ \
  --auth-username root \
  --auth-password 'password!'
jpmens commented 8 months ago

Works beautifully, thank you very much! And since I typically delete toots after a few weeks, I'll add a screenshot of the toot you refer to above here for posterity. :-)

rabbit-10681