mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
10.84k stars 889 forks source link

[kemono.party] posts with different images but same filename #1751

Closed Doofy420 closed 2 years ago

Doofy420 commented 2 years ago

using: 1.18.3-dev example (nsfw): link

It seems that the latest changes works well in most cases where it skips the first image when attachments are present, but fails in cases like this. I was hoping to be able to grab these images without relying on {num} (which doesn't work anyway.)

I suppose it's impossible without comparing the filesizes?

mikf commented 2 years ago

Yep, it is impossible to tell whether two files are the same before downloading them with the information provided by the API (or in general). There is no file hash or anything other than an unreliable filename.

We could potentially send a HEAD request for each (Patreon) file to get its Content-Length and Last-Modified headers before downloading it and use those to potentially spot duplicates. More network traffic, but might be better than what we currently have.

Doofy420 commented 2 years ago

Is there a setting I can use so I can grab the first image again? Will probs just use {num} with it

mikf commented 2 years ago

Apologies for not mentioning this earlier. The option has the wonderful name patreon-skip-file.

Doofy420 commented 2 years ago

Ah, thanks!