harvard-lil / scoop

🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.
MIT License
117 stars 8 forks source link

Make principal web archive capture optional? #25

Open matteocargnelutti opened 2 years ago

matteocargnelutti commented 2 years ago

Should it be possible to skip the web capture step?

Potential use case: only capturing provenance summary, screenshot, pdf snapshot and video extraction on a given web page?

edsu commented 10 months ago

Is the idea that it would cut down on the amount of storage?

mdellabitta commented 10 months ago

I can't address your question, but wanted to say: Nice to see you here, @edsu!

matteocargnelutti commented 10 months ago

Hi @edsu!

Is the idea that it would cut down on the amount of storage?

It is more to account for use cases that do not revolve around capturing HTTP exchanges in a WARC. For example, some users might just want to make a PDF capture or screenshot of a web page using Scoop, and only care about that artifact.

edsu commented 10 months ago

But don't you need to do the HTTP exchanges to generate the screenshot?

matteocargnelutti commented 10 months ago

@edsu Yes and no.