mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.19k stars 913 forks source link

Avoiding downloading duplicate files from kemono.party #2032

Closed a84r7a3rga76fg closed 2 years ago

a84r7a3rga76fg commented 2 years ago
            "directory": [
            "{service} kemono.party {user[:100]}",
            "{id} {title[:70]} [{date}]"
            ],
            "filename": "{num:>03} {filename[:50]}.{extension}",
            "skip": "true",

I thought that would work but I was wrong. Can someone please help me avoid downloading duplicate files?

Hrxn commented 2 years ago

You should try the archive file option..

a84r7a3rga76fg commented 2 years ago

You should try the archive file option..

I already am using it

Hrxn commented 2 years ago

Then you are using the default archive format settings, unless you've fiddled with these, I suppose? https://github.com/mikf/gallery-dl/blob/2076d40681c7e8959b4a280350a20ffd9a509bb2/gallery_dl/extractor/kemonoparty.py#L27

I'm afraid but I fail to see how you are getting duplicates here. I mean, it depends on the site providing correct metadata, of course. Maybe an example would help?

a84r7a3rga76fg commented 2 years ago

Then you are using the default archive format setting

I am.

Maybe an example would help?

In a post (or whatever it's called), there are sometimes two or more files with the same hash.

kattjevfel commented 2 years ago

A lot of times artists simply re-upload the same image in a different post, it might have a new filename etc and cannot be detected as duplicate without first downloading it.

a84r7a3rga76fg commented 2 years ago

I didn't say anything about different posts.

In a post (or whatever it's called), there are sometimes two or more files with the same hash.

Hrxn commented 2 years ago

Different files with the same hash? Doesn't make any sense.

mikf commented 2 years ago

Files with duplicate hash in the same post now get skipped if there is a file hash value available, which should be the case for 99% of all files hosted on kemono (https://github.com/mikf/gallery-dl/commit/d433735750b31a5268a9290549ef891e0e23ab6b)

Doofy420 commented 2 years ago

So what's the best way to handle situations like in #1751 (different hash, same filename)? Looks like {num} works OK, but the file order is somewhat messed up.

a84r7a3rga76fg commented 2 years ago

Files with duplicate hash in the same post now get skipped if there is a file hash value available, which should be the case for 99% of all files hosted on kemono (d433735)

Thank you so much, I'll close this.

mikf commented 2 years ago

@outlaw420 can't say what the best way is, but you can add some other metadata values to the filename to make it unique, like {num}, {type}, {hash}, or just parts of these values {type[0]}, {hash[:8]}.