jsvine / waybackpack

Download the entire Wayback Machine archive for a given URL.
MIT License
2.8k stars 189 forks source link

Download URL only #50

Closed Mennaruuk closed 2 years ago

Mennaruuk commented 2 years ago

It’d be pretty cool to have an option that allows downloading only the URL specified and not the entire archive for all sites under that domain. So for example if I want to download https://www.cnn.com/2022/02/02/us/confederate-monuments-removed-2021-whose-heritage/index.html, I just want that page and that page only, not all CNN urls.

thank you!

jsvine commented 2 years ago

Running waybackpack "https://www.cnn.com/2022/02/02/us/confederate-monuments-removed-2021-whose-heritage/index.html" should achieve your goal. If it does not, let me know, providing a description (and as much specific output as possible) of what happened instead.

Mennaruuk commented 2 years ago

Thank you! It worked flawlessly. I suggest including the enclosing in double quotation in the usage part of README.md so that it helps out other users.

Quick question: Is there a way to have --uniques-only choose either the earliest or latest snapshot if a site has been archived multiple times over several days? Thanks.

jsvine commented 2 years ago

Quick question: Is there a way to have --uniques-only choose either the earliest or latest snapshot if a site has been archived multiple times over several days? Thanks.

I don't believe there is, but I may be wrong. This documentation, from the API which this library uses, does not seem to support it, but I may have overlooked the option: https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md#collapsing

I suggest including the enclosing in double quotation in the usage part of README.md so that it helps out other users.

I'm not sure the double-quotation is actually necessary. I included it in my response just in case, but the command works exactly the same on my computer without the double-quotation marks. If you can reproduce the non-quoted version being a problem on your device, let us know, as well as the OS you're using.

Closing for now, but feel free to continue the discussion here.