mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
10.76k stars 886 forks source link

[Question] How to stop kemono.party from downloading duplicates ? #2886

Open maxman2103 opened 1 year ago

maxman2103 commented 1 year ago

so some of the artist accounts from kemono.party are downloading the same images but with different name. accounts such

https://kemono.party/fanbox/user/11701235 https://kemono.party/fanbox/user/8252709

this 2 and some other downloaded the same images with different name. this happens when i try to say download new images from the site instead of skipping they just the image again. Like this post https://kemono.party/fanbox/user/8252709/post/3919619 , 2 copies were downloaded of every images and the numbering for one of the images were like this

3919619_2022 6月号_10_6bafe37f-34a7-4f18-bde5-fe5d324540c6 3919619_2022 6月号_10_6c251ea2-033c-4c58-8a60-5ed92e545d82

same image with 2 different name.

how do I stop this from happening again.

More info- I use gallery-dl.conf and most of it is default except twitter where just added my account name and pass (English is not my first language so if have issue not understand I apologies)

mikf commented 1 year ago

I think you can use a download archive with archive-format set to "{hash}". This should prevent it from downloading the same file multiple times.

maxman2103 commented 1 year ago

I am not very knowledgeable in this, how do i download and setup download archive and set hash?

mikf commented 1 year ago

Get the default config file, put it somewhere gallery-dl will load it automatically (or use -c), and add the following next to the other sites' options (adjust path/etc if necessary):

        "kemonoparty": {
            "archive": "%APPDATA%/gallery-dl/kemono.sqlite3",
            "archive-format": "{hash}"
        },

You might also want to use a different filename than the default.

biggestsonicfan commented 1 year ago

I can confirm (in my quest to deduplicate my own archives) that kemonoparty does change their hashes if the type changes from an attachment type to a file type or vice-versa. So checking the hash against a database of hashes will only get you so far.

maxman2103 commented 1 year ago

@cglmrfreeman should I use something like ""archive-format": ": "{id}_{title}_{num:>02}_{hash}.{extension}", this or change it? found it form #2740 and modify it a little. sorry if i made error i am not very knowable at coding.

biggestsonicfan commented 1 year ago

You can't use ""archive-format": ": "{id}_{title}_{num:>02}_{hash}.{extension}", because you'd have a duplicate ": in there and the config would not be valid.

However, like I previously said, I am finding kemonoparty is changing their hashes on some of their files, hell Patreon itself outside of Kemonoparty also changes hashes on their files. So if you want 100% no duplicates and 100% of all posts, I would remove the hash entirely to get this: "archive-format": "{service}_{user}_{id}_{num}". You get the service, the user, the number of the post, and the number the image was in the post. If the filename or hash changes, the id and num will not.

Yasand123 commented 10 months ago

You can't use ""archive-format": ": "{id}_{title}_{num:>02}_{hash}.{extension}", because you'd have a duplicate ": in there and the config would not be valid.

However, like I previously said, I am finding kemonoparty is changing their hashes on some of their files, hell Patreon itself outside of Kemonoparty also changes hashes on their files. So if you want 100% no duplicates and 100% of all posts, I would remove the hash entirely to get this: "archive-format": "{service}_{user}_{id}_{num}". You get the service, the user, the number of the post, and the number the image was in the post. If the filename or hash changes, the id and num will not.

Is there a way for these changes in file name/format to apply retroactively, will it rename already downloaded files?

biggestsonicfan commented 10 months ago

Is there a way for these changes in file name/format to apply retroactively, will it rename already downloaded files?

I ran into this issue and the only way I found that works is to run a preprocessor passing the old filename, new filename, and hash to it and having the preprocessor verify the old file and hash then renaming the file.