simonw / shot-scraper

A command-line utility for taking automated screenshots of websites
https://shot-scraper.datasette.io
Apache License 2.0
1.71k stars 78 forks source link

Idea: plugins, packaging JavaScript to be injected into pages #67

Open simonw opened 2 years ago

simonw commented 2 years ago

Tweet: https://twitter.com/simonw/status/1514657436287705119

I want to be able to use tricks like this one - where Readability.js is injected into a page - without relying on CDNs: https://til.simonwillison.net/shot-scraper/readability

One option would be to package things like this up as plugins using Pluggy (as seen in Datasette) - then serve the JavaScript assets using a /-/shot-scraper-xyz/plugins/... route configured using https://playwright.dev/python/docs/api/class-page#page-route

rdmurphy commented 2 years ago

Just to add an additional example — Playwright does a version of this behind the scenes itself. It "injects" helper scripts into every page.

jefftriplett commented 2 years ago

Plugins would be great. Especially if that makes it possible to detect if an image already exists or to not save a page if a 404 or other status code is detected.