Closed Mennaruuk closed 2 years ago
Running waybackpack "https://www.cnn.com/2022/02/02/us/confederate-monuments-removed-2021-whose-heritage/index.html"
should achieve your goal. If it does not, let me know, providing a description (and as much specific output as possible) of what happened instead.
Thank you! It worked flawlessly. I suggest including the enclosing in double quotation in the usage part of README.md so that it helps out other users.
Quick question: Is there a way to have --uniques-only
choose either the earliest or latest snapshot if a site has been archived multiple times over several days? Thanks.
Quick question: Is there a way to have
--uniques-only
choose either the earliest or latest snapshot if a site has been archived multiple times over several days? Thanks.
I don't believe there is, but I may be wrong. This documentation, from the API which this library uses, does not seem to support it, but I may have overlooked the option: https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md#collapsing
I suggest including the enclosing in double quotation in the usage part of README.md so that it helps out other users.
I'm not sure the double-quotation is actually necessary. I included it in my response just in case, but the command works exactly the same on my computer without the double-quotation marks. If you can reproduce the non-quoted version being a problem on your device, let us know, as well as the OS you're using.
Closing for now, but feel free to continue the discussion here.
It’d be pretty cool to have an option that allows downloading only the URL specified and not the entire archive for all sites under that domain. So for example if I want to download https://www.cnn.com/2022/02/02/us/confederate-monuments-removed-2021-whose-heritage/index.html, I just want that page and that page only, not all CNN urls.
thank you!