jsvine / waybackpack

Download the entire Wayback Machine archive for a given URL.
MIT License
2.8k stars 189 forks source link

Request returning no error code and only Index.html file #79

Closed pedromerry closed 3 months ago

pedromerry commented 3 months ago

I'm trying to execute the wayback backpack command to download the 3rd July 2023 snapshot of the "https://projects.ttlexceeded.com/" web page, with no success. The command returns no errors and only downloads a single index.html. When visiting the snapshot on the browser through Web archive I can see the full web page perfectly. Can you help me out? I'm using the '--follow-redirects' switch and don't understand what's happening. Thanks!! image

jsvine commented 3 months ago

Hi @pedromerry, I'm not sure I understand the specific issue being raised. How does what you see differ from what you'd expect to see?

When I run these commands, I get what I would expect to see:

❯ waybackpack https://projects.ttlexceeded.com/ --follow-redirects --from-date 20230702 --to-date 20230704 --dir wb-test
INFO:waybackpack.pack: Fetching https://projects.ttlexceeded.com/ @ 20230703013039
INFO:waybackpack.pack: Writing to wb-test/20230703013039/projects.ttlexceeded.com/index.html
❯ tree wb-test/
wb-test/
└── 20230703013039
    └── projects.ttlexceeded.com
        └── index.html

2 directories, 1 file

Opening index.html:

Screenshot 2024-03-23 at 12 10 25 PM

Or are you expecting waybackpack to recursively spider every page on that subdomain? If so, unfortunately, that's not part of waybackpack's features; you can try, however, the code in this pull request/fork.

pedromerry commented 3 months ago

Hello, Thank you very much for the response, I think the phrase "download the entire Wayback Machine archive for a given URL" got me confused, and as you say, understood it would download recursively all linked files from index.html within the subdomain. I will proceed to close the issue then. Many thanks, Pedro