mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.7k stars 953 forks source link

Tumblr list index out of range #129

Closed eoop closed 5 years ago

eoop commented 5 years ago

Hello. It appears that Tumblr blogs with more than 1166 items throws this particular index error. I have sqlite archive enabled. I've attached the verbose output below.

tumblr: Traceback Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/gallery_dl/job.py", line 51, in run for msg in self.extractor: File "/usr/local/lib/python3.7/site-packages/gallery_dl/extractor/tumblr.py", line 113, in items yield self._prepare_image(url, post) File "/usr/local/lib/python3.7/site-packages/gallery_dl/extractor/tumblr.py", line 161, in _prepare_image post["hash"] = parts[1] if parts[1] != "inline" else parts[2] IndexError: list index out of range

Hrxn commented 5 years ago

If by 1166 items you mean the number of posts on a certain blog (postcount), then this is not the issue, I have already tried it with blogs containing far more posts in the past. And unless they haven't fundamentally changed their site in the last weeks (which is not the case, as far as I can tell), I can guarantee you that this should work just as before.

So the crux has to be the specific blog you are trying to process, or maybe rather just one specific post on said blog. Can you provide a link so that someone (like me) can try to reproduce the behavior shown here? Or is there some reason that is against sharing this example?

eoop commented 5 years ago

Gotcha. Here’s the blog: http://tingtongten.tumblr.com (NSFW)

mikf commented 5 years ago

The offending post is http://tingtongten.tumblr.com/post/118653991558. It contains an image whose URL doesn't follow the usual Tumblr pattern, but links directly to its source (http://i.gyazo.com/d033577d05901a8da1b0847628b7c10e.png).

Hrxn commented 5 years ago

mikf being very fast again.. 😄

At least I'm here to confirm https://github.com/mikf/gallery-dl/commit/95636418ad0667a37c3f7a70052a82203a56c0b2 as working:

PS E:\Test\Temp> gallery-dl -d . "http://tingtongten.tumblr.com/post/118653991558"
[tumblr][error] An unexpected error occurred: IndexError - list index out of range. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
PS E:\Test\Temp> pip --quiet install --upgrade "https://github.com/mikf/gallery-dl/archive/master.zip"
PS E:\Test\Temp> gallery-dl -d . "http://tingtongten.tumblr.com/post/118653991558"
* .\Tumblr\_Posts\tingtongten_118653991558_1_this-was-a-perfect-setup-with-your-icon-waggle.png
PS E:\Test\Temp>

Note: That \Tumblr\_Posts\ part in the output is caused by my custom config, others will probably see something different here

But I have one more question with the changes in that commit: https://github.com/mikf/gallery-dl/blob/95636418ad0667a37c3f7a70052a82203a56c0b2/gallery_dl/extractor/tumblr.py#L161-L164

What happens if a blog has more than one of these "problematic" posts, and subsequently gallery-dl encounters an IndexError and sets post["hash"] = ""? This alone is not an issue, obviously, but what if someone uses not the default settings for archive and/or filename output (like archive_fmt = "{id}_{num}") , but relies on {hash} being used for the archive file?