johanneszab / TumblThree

A Tumblr Blog Backup Application
https://www.jzab.de/content/tumblthree
MIT License
922 stars 133 forks source link

Downloading images and posts from archived dead tumblrs #134

Open ZenythFactor opened 7 years ago

ZenythFactor commented 7 years ago

One day i came across an artist's old tumblr that was deleted by the accursed moderation of Tumblr. Searched everywhere for a backup until i remembered that the Wayback Machine can archive various tumblr posts from the past.

Warning NSFW: http://web.archive.org/web/20140210075852/http://noillart.tumblr.com/

I was wonder if the TumblThree program will be capable of grabbing images from older versions of the Tumblr page via web.archive.org?

johanneszab commented 7 years ago

Currently, no. Possible, sure.

If you want to implement this, you can probably reuse the TumblrBlogCrawler.cs code from the parse branch. If you're lucky, you only have to change the pagination to the right url. So, instead of pointing to http://blogname.tumblr.com/ it should direct the requests to http://web.archive.org/web/digits/http://blogname.tumblr.com/.

Taranchuk commented 7 years ago

I'm not sure that the web archiver is storing the entire deleted blog. Under the link http://web.archive.org/web/20130226131355/http://noillart.tumblr.com/page/5 the web archiver shows an already blank page and on the following pages too. Here's what I can advise. It is impossible to download deleted blogs and no developer will be able to give you this feature, I think so, but you can back up all the blogs that you are following. It is not necessary to download all images and videos and everything else, but you can download the list of links or metadata where these lists also exist and from time to time to update them, it's easy, just make a new copy of the program with a portable mode for this. When a blog is deleted, then all the images that he posted are remaining in the server and you can download them from the links. So download lists of links for backup and they weigh a little, just a few megabytes on heavy blogs and few hundreds of kilobytes on small blogs.