hartator / wayback-machine-downloader

Download an entire website from the Wayback Machine.
Other
5.27k stars 693 forks source link

Images and css missing, only downloads html #179

Open lucky1804 opened 3 years ago

lucky1804 commented 3 years ago

So running command wayback_machine_downloader https://fairies.disney.com/tinker-bell -t 20140719225715 -a results in html file without any images, css and other files and all links point to various CDN subdomains where files no longer exist instade of pointing to and downloading files locally, when opening website on archive.org all images are visible but the downloader only pulls html files

Here is the output after running command `C:\Users\private>wayback_machine_downloader https://fairies.disney.com/tinker-bell -t 20140719225715 -a Downloading https://fairies.disney.com/tinker-bell to websites/fairies.disney.com/ from Wayback Machine archives.

Getting snapshot pages. found 8 snaphots to consider.

1 files to download: http://fairies.disney.com/tinker-bell -> websites/fairies.disney.com/tinker-bell/index.html (1/1)

Download completed in 5.36s, saved in websites/fairies.disney.com/ (1 files)

C:\Users\private>`

xixido90 commented 3 years ago

I got the same issue. Any update on this please?

stevemarksd commented 3 years ago

same here. I think we need to use some older version. Current version is broken...

Pikamander2 commented 3 years ago

same here. I think we need to use some older version. Current version is broken...

I think it's the other way around. This project is several years old and must have broken due to some kind of recent archive.org update.

From what I can tell at a glance, it seems like maybe archive.org did some kind of restructuring to reduce the number of duplicate files on their server, which means that a single snapshot will only get you the most recently changed files, but none of the ones that are identical the last saved copy.

As a workaround, you can download the files from every snapshot and merge them afterwards with a tool of your choice (such as Windows's file explorer):

wayback_machine_downloader http://example.com --all-timestamps --concurrency 5

However, that process is slower and more error prone, so if anyone knows of a better method, that would be great.

Pinging @hartator - It seems like the core functionality of the script might be broken right now.

kumednuy commented 3 years ago

Hi. Css and images are not missing, theyre also downloading, but many of them have a mistakes in a name, thats a one problem. The second problem is that file index.html contibue to searching for all files not on the my computer, but on the site, which I trying to save. But this site id down for a few year, so... All that`s just for me, and sorry for my bad English)

ZizzyDizzyMC commented 3 years ago

I'm getting the same error here, anyone know of an updated tool that works?

lazybearsoft commented 3 years ago

6 months and still no fix for this?

fsacer commented 2 years ago

still the same issue