mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.4k stars 931 forks source link

How to reduce reddit duplicate posts from being downloaded #2031

Closed voreman567 closed 2 years ago

voreman567 commented 2 years ago

Hi so I've been messing with the reddit section of my config. I've noticed a lot of users on reddit tend to post the same post in multiple subreddits, initially what I did to combat this was I made it so every filename was the title and skipped duplicate filenames, but then I realized on posts with multiple images that it would skip those so in order to fix that the filename is now the title + the filename of the post, which works, but it still allows the same posts posted in different subreddits to be downloaded (I want the least amount of duplicate images downloaded.) For example, https://www.reddit.com/user/AGirlWhoLyks2Eat (NSFW!!) posts the same post to multiple subreddits, so I have a lot of duplicate posts in the folder. Here's my config: "reddit": { "directory": ["reddit", "{author}"], "filename":"{title} {filename}.{extension}", "skip": true, "comments": 0, "morecomments": false,
"refresh-token":"cache", "date-format": "%Y-%m-%dT%H:%M:%S", "recursion": 0, "videos": "ytdl", "client-id":"", "user-agent": "Python:Gallery-dl:0.8.4 (by /u/USERNAME)" },

Hrxn commented 2 years ago

You should use the archive file option in your config. The default archive key used for reddit is "{filename}", which should already do what you want. This only applies to content hosted by reddit itself, so use archive options for imgur/gfycat/redgifs accordingly.