palewire / savepagenow

A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service
https://palewi.re/docs/savepagenow/
MIT License
167 stars 23 forks source link

Feature request: Only save page if there is no snapshot yet #14

Closed baerbock closed 3 years ago

baerbock commented 5 years ago

My usecase of your great tool is to archive articles before they've vanished behind a paywall.

Because I don't want to stretch the Internet Archive's resources without real necessity I would like to only save a page if there is no snapshot yet.

What do you think about implementing this?

palewire commented 5 years ago

How would such a test be accomplished?

baerbock commented 5 years ago

There is an API called http://archive.org/wayback/available?url=$url

Here are example results:

{"url": "https://rbnenergy.com/shes-electric-are-e-fracs-a-fix-for-permian-gas-constraints-and-giveaway-prices", "archived_snapshots": {"closest": {"status": "200", "available": true, "url": "http://web.archive.org/web/20190713192002/https://rbnenergy.com/shes-electric-are-e-fracs-a-fix-for-permian-gas-constraints-and-giveaway-prices", "timestamp": "20190713192002"}}}

{"url": "....", "archived_snapshots": {}}

chazanov commented 5 years ago

@palewire I'd like to implement this feature, but I'm a Python beginner. Do you think it can be easily done?

victoriatomzik commented 4 years ago

@chazanov we need it, go for it!

palewire commented 4 years ago

Sounds like it would be a great add.

palewire commented 3 years ago

I'll take any pull requests you might have for this, but I'm closing the ticket so it's gotten a bit stale.