Closed jamesking closed 8 months ago
I just ran into this today while testing Simon's TIL about running axe-core with shot-scraper.
I've taken @jamesking's suggestion above and implemented it in a PR. The --bypass-csp
option is added to all commands that allow you to execute Javascript. See #116.
This is a really smart feature request, and #116 looks like a good implementation.
Problem
I have been following this TIL to run the
Readability.js
on a page with Shot Scraper.https://til.simonwillison.net/shot-scraper/readability
This worked fine for pages with liberal content security policies, however when tried to scrape a page with a stronger CSP I ran across this error:
When a page has a strong CSP like this it limits the ability for Shot Scraper to run Javascript on a page before processing it.
Suggestion
The Playwright Python tools have an optional
bypass_csp
argument that can be passed to thenew_context
method.As a test I monkey-patched
shot_scraper/cli.py
with the following:And now the
Readability.js
script executes without a problem. :)It would be really useful to give Shot Scraper a CLI argument like
--bypass-csp
that would then optionally add this argument in Playwright and allow more flexibility to run javascript on pages like this.Thank you for a great tool!