Closed pedromerry closed 3 months ago
Hi @pedromerry, I'm not sure I understand the specific issue being raised. How does what you see differ from what you'd expect to see?
When I run these commands, I get what I would expect to see:
❯ waybackpack https://projects.ttlexceeded.com/ --follow-redirects --from-date 20230702 --to-date 20230704 --dir wb-test
INFO:waybackpack.pack: Fetching https://projects.ttlexceeded.com/ @ 20230703013039
INFO:waybackpack.pack: Writing to wb-test/20230703013039/projects.ttlexceeded.com/index.html
❯ tree wb-test/
wb-test/
└── 20230703013039
└── projects.ttlexceeded.com
└── index.html
2 directories, 1 file
Opening index.html
:
Or are you expecting waybackpack
to recursively spider every page on that subdomain? If so, unfortunately, that's not part of waybackpack
's features; you can try, however, the code in this pull request/fork.
Hello, Thank you very much for the response, I think the phrase "download the entire Wayback Machine archive for a given URL" got me confused, and as you say, understood it would download recursively all linked files from index.html within the subdomain. I will proceed to close the issue then. Many thanks, Pedro
I'm trying to execute the wayback backpack command to download the 3rd July 2023 snapshot of the "https://projects.ttlexceeded.com/" web page, with no success. The command returns no errors and only downloads a single index.html. When visiting the snapshot on the browser through Web archive I can see the full web page perfectly. Can you help me out? I'm using the '--follow-redirects' switch and don't understand what's happening. Thanks!!