simonw / shot-scraper

A command-line utility for taking automated screenshots of websites
https://shot-scraper.datasette.io
Apache License 2.0
1.57k stars 70 forks source link

Optionally disable content security policies #114

Closed jamesking closed 8 months ago

jamesking commented 1 year ago

Problem

I have been following this TIL to run the Readability.js on a page with Shot Scraper.

https://til.simonwillison.net/shot-scraper/readability

This worked fine for pages with liberal content security policies, however when tried to scrape a page with a stronger CSP I ran across this error:

Refused to load the script 'https://cdn.skypack.dev/@mozilla/readability' because it violates the following Content Security Policy directive: …

When a page has a strong CSP like this it limits the ability for Shot Scraper to run Javascript on a page before processing it.

Suggestion

The Playwright Python tools have an optional bypass_csp argument that can be passed to the new_context method.

As a test I monkey-patched shot_scraper/cli.py with the following:

# cli.py, line 353
...
context_args["bypass_csp"] = True # <-- Line added
context = browser_obj.new_context(**context_args)
...

And now the Readability.js script executes without a problem. :)

It would be really useful to give Shot Scraper a CLI argument like --bypass-csp that would then optionally add this argument in Playwright and allow more flexibility to run javascript on pages like this.

Thank you for a great tool!

sesh commented 11 months ago

I just ran into this today while testing Simon's TIL about running axe-core with shot-scraper.

I've taken @jamesking's suggestion above and implemented it in a PR. The --bypass-csp option is added to all commands that allow you to execute Javascript. See #116.

simonw commented 8 months ago

This is a really smart feature request, and #116 looks like a good implementation.

simonw commented 8 months ago

Documentation: https://shot-scraper.datasette.io/en/stable/javascript.html#bypassing-content-security-policy-headers