jsvine / waybackpack

Download the entire Wayback Machine archive for a given URL.
MIT License
2.8k stars 189 forks source link

Blank files #67

Open Jack-Lewis1 opened 10 months ago

Jack-Lewis1 commented 10 months ago

Hey,

I'm running a somewhat simple command:

wayback_machine_downloader absglobal.com --all-timestamps --from 20110101000000 --to 20221231235959 --concurrency 5 --only "/(\/$|\.(html|htm|aspx)$)/i" --all

The downloader somewhat works. I get quite a few errors like so:

Failed to open TCP connection to web.archive.org:443 (Connection refused - connect(2) for "web.archive.org" port 443)

I accept this. But the real problem seems to be:

I get a folder structure like so: 20110101152320/contact-us/europe/index.html but the html is just blank?

tempo660 commented 9 months ago

I'm also receiving this issue. Nothing but blank HTML files.

jsvine commented 9 months ago

@tempo660 Thanks for flagging. Can you share the command / URL you're using? And can you confirm that the Wayback Machine's version (online) for that particular timestamp is not blank itself?

@Jack-Lewis1 Based on your message, I suspect that you're using a different tool than is represented by this repository. This repository supplies waybackpack, not wayback_machine_downloader.

tempo660 commented 9 months ago

@tempo660 Thanks for flagging. Can you share the command / URL you're using? And can you confirm that the Wayback Machine's version (online) for that particular timestamp is not blank itself?

Here's the command and website I used:

waybackpack https://www.fat-pie.com -d /storage/emulated/0/fatpie --progress --from-date 2002 --to-date 2020

Run using Termux for Android in case you are curious about the save directory.

jsvine commented 9 months ago

Thanks, @tempo660! Confirming that I get the same result on my end. But the good news is that there seems to be an easy fix: Just append a trailing / to the URL — i.e., https://www.fat-pie.com/. With that, I get proper results.