mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.4k stars 930 forks source link

gallery-dl skips reddit links to imgur/redgifs with [deleted] users #4045

Closed error1852 closed 1 year ago

error1852 commented 1 year ago

This only happens with child extractors when the user who posted the link is [deleted]. Links that are hosted by reddit download just fine, but links that are hosted elsewhere just don't seem to register at all - it just moves on to the next line.

{
    "extractor":
    {
        "base-directory": "./gallery-dl/",
        "parent-directory": false,
        "postprocessors": null,
        "archive": null,
        "cookies": null,
        "cookies-update": true,
        "proxy": null,
        "skip": true,

        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0",
        "retries": 4,
        "timeout": 30.0,
        "verify": true,
        "fallback": true,

        "sleep": 0,
        "sleep-request": 0,
        "sleep-extractor": 0,

        "path-restrict": "auto",
        "path-replace": "_",
        "path-remove": "\\u0000-\\u001f\\u007f",
        "path-strip": "auto",
        "path-extended": true,

        "extension-map": {
            "jpeg": "jpg",
            "jpe" : "jpg",
            "jfif": "jpg",
            "jif" : "jpg",
            "jfi" : "jpg"
        },

        "gfycat":
        {
            "filename": "{date} {title}-{id}.{extension}",
            "directory": ["{subreddit}", "{author}"],
            "format": ["mp4", "webm", "mobile", "gif"]
        },
        "imgur":
        {
            "mp4": true,
            "filename": "{date} {title}-{id}.{extension}",
            "directory": ["{subreddit}", "{author}"]
        },
        "reddit":
        {
            "comments": 0,
            "morecomments": false,
            "date-min": 0,
            "date-max": 253402210800,
            "date-format": "%Y-%m-%dT%H:%M:%S",
            "id-min": null,
            "id-max": null,
            "recursion": 0,
            "videos": true,
        "filename": "{date} {title}-{id}.{extension}",
            "directory": ["{subreddit}", "{author}"],
            "parent-directory": false,
            "parent-metadata": true,
            "blacklist": ["tumblr"]
        },
        "redgifs":
        {
            "filename": "{date} {title}-{id}.{extension}",
            "directory": ["{subreddit}", "{author}"],
            "format": ["hd", "sd", "gif"]
        }
    },
cheese529 commented 1 year ago

I can confirm this behavior, I am having the same issue. Very unlucky that I just found out about it after spending days archiving for imgur links lol

xion2 commented 1 year ago

Can you post a case example? Most the posts I can find with deleted users also have zero information in the post itself.

cheese529 commented 1 year ago

Use this link, it is NSFW tho so fair heads up. https://www.reddit.com/r/cumsluts/comments/otvq6n/love_that_lip_curl/

xion2 commented 1 year ago

Use this link, it is NSFW tho so fair heads up. https://www.reddit.com/r/cumsluts/comments/otvq6n/love_that_lip_curl/

It's working fine with my config. I think it may be skipping because you're trying to extract authors and there is no author in those cases. Try removing the "author" related code and give it another go.

cheese529 commented 1 year ago

I think it may be skipping because you're trying to extract authors and there is no author in those cases. Try removing the "author" related code and give it another go.

I'm not trying to extract authors though.

Here is my config. ``` { "extractor": { "base-directory": "D:/APush/", "parent-directory": false, "postprocessors": null, "archive": null, "cookies": null, "cookies-update": true, "proxy": null, "skip": true, "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0", "retries": 4, "timeout": 30.0, "verify": true, "fallback": true, "sleep": 0, "sleep-request": 0, "sleep-extractor": 0, "path-restrict": { "\\": "⧹", "/" : "⧸", "|" : "│", ":" : "꞉", "*" : "∗", "?" : "?", "\"": "″", "<" : "﹤", ">" : "﹥" }, "path-replace": "_", "path-remove": "\u0000-\u001f\u007f", "path-strip": "auto", "path-extended": true, "extension-map": { "jpeg": "jpg", "jpe" : "jpg", "jfif": "jpg", "jif" : "jpg", "jfi" : "jpg" }, "artstation": { "external": false, "pro-first": true }, "aryion": { "username": null, "password": null, "recursive": true }, "bbc": { "width": 1920 }, "blogger": { "videos": true }, "cyberdrop": { "domain": null }, "danbooru": { "username": null, "password": null, "external": false, "metadata": false, "ugoira": false }, "derpibooru": { "api-key": null, "filter": 56027 }, "deviantart": { "filename": "{filename}.{extension}", "client-id": "25067", "client-secret": "bb0826c50fe85589d799cf5f93c552bb", "refresh-token": "982d400895e87753b32650f95eff52a857d8c39b", "auto-watch": true, "auto-unwatch": true, "comments": false, "extra": true, "flat": true, "folders":false, "group": true, "include": "all", "journals": "html", "mature": true, "metadata": true, "original": true, "wait-min": 0 }, "e621": { "username": null, "password": null }, "exhentai": { "username": null, "password": null, "domain": "auto", "limits": true, "metadata": false, "original": true, "sleep-request": 5.0 }, "flickr": { "videos": true, "size-max": null }, "furaffinity": { "descriptions": "text", "external": false, "include": "gallery", "layout": "auto" }, "gelbooru": { "api-key": null, "user-id": null }, "gfycat": { "filename": { "'_reddit' in locals()": "{_reddit[title]} {_reddit[date]:%Y-%m-%d} {_reddit[id]}.{extension}" }, "format": ["mp4", "webm", "mobile", "gif"] }, "gofile": { "api-token": null, "website-token": "12345" }, "hentaifoundry": { "include": "pictures" }, "hitomi": { "format": "webp", "metadata": false }, "idolcomplex": { "username": null, "password": null, "sleep-request": 5.0 }, "imgbb": { "username": null, "password": null }, "imgur": { "filename": { "'_reddit' in locals()": "{_reddit[title]} {_reddit[date]:%Y-%m-%d} {_reddit[id]}{num:?_//}.{extension}" }, "mp4": true, "postprocessors": [ { "name": "metadata", "filter": "description", "format": "{description}" } ] }, "inkbunny": { "username": null, "password": null, "orderby": "create_datetime" }, "instagram": { "filename": "{date:%Y-%m-%d} {post_id}_{owner_id}_{num}.{extension}", "api": "rest", "cookies": null, "include": "posts", "sleep-request": [6.0, 12.0], "videos": true, "saved": { "directory": ["{category}", "Saved"], "highlights": { "directory": ["Highlights", "{highlight_title}"], "filename": "{date:%Y-%m-%d} {post_id}_{num}.{extension}" } } }, "khinsider": { "format": "mp3" }, "luscious": { "gif": false }, "mangadex": { "api-server": "https://api.mangadex.org", "api-parameters": null, "lang": null, "ratings": ["safe", "suggestive", "erotica", "pornographic"] }, "mangoxo": { "username": null, "password": null }, "newgrounds": { "username": null, "password": null, "flash": true, "format": "original", "include": "art" }, "nana": { "favkey": null }, "nijie": { "username": null, "password": null, "include": "illustration,doujin" }, "nitter": { "quoted": false, "retweets": false, "videos": true }, "oauth": { "browser": true, "cache": true, "host": "localhost", "port": 6414 }, "paheal": { "metadata": false }, "pillowfort": { "external": false, "inline": true, "reblogs": false }, "pinterest": { "filename": { "'_reddit' in locals()": "{_reddit[title]} {_reddit[date]:%Y-%m-%d} {_reddit[id]}{num:?_//}.{extension}", "not locals().get('title')": "{id}.{extension}" }, "domain": "auto", "sections": true, "videos": true }, "pixiv": { "refresh-token": null, "include": "artworks", "metadata": false, "metadata-bookmark": false, "tags": "japanese", "ugoira": true }, "reactor": { "gif": false, "sleep-request": 5.0 }, "imagefap": { "filename": "{num}-{filename}.{extension}" }, "reddit": { "filename": "{title[:232]} {date:%Y-%m-%d} {id}{num:?_//}.{extension}", "comments": 500, "morecomments": false, "date-min": 0, "date-max": 253402210800, "date-format": "%Y-%m-%dT%H:%M:%S", "id-min": null, "id-max": null, "recursion": 0, "videos": "ytdl", "parent-directory": true, "parent-metadata": "_reddit" }, "redgifs": { "filename": { "'_reddit' in locals()": "{_reddit[title]} {_reddit[date]:%Y-%m-%d} {_reddit[id]}.{extension}", "not locals().get('title')": "{filename}.{extension}" }, "format": ["hd", "sd", "gif"] }, "sankaku": { "username": null, "password": null, "refresh": false }, "sankakucomplex": { "embeds": false, "videos": true }, "skeb": { "article": false, "filters": null, "sent-requests": false, "thumbnails": false }, "smugmug": { "videos": true }, "seiga": { "username": null, "password": null }, "subscribestar": { "username": null, "password": null }, "tsumino": { "username": null, "password": null }, "tumblr": { "api-key": "2vuoC1B7VTA9fi7KHo1YDIv0WQCGuQGyL4Re7AFi4dlZVSFMrJ", "api-secret": "vHohezwC4TE9VSBX824gk2trSLDy6OqMkMuFyvTwQp8x2AtlSF", "avatar":true, "date-min": 0, "date-max": null, "external": true, "inline": true, "posts": "all", "offset": 0, "original": true, "reblogs": true }, "twitter": { "username": null, "password": null, "cards": false, "conversations": false, "pinned": false, "quoted": false, "replies": false, "retweets": true, "logout": false, "filename": "{author[name]} {date:%Y-%m-%d} {tweet_id}_{num}.{extension}", "strategy": null, "text-tweets": true, "twitpic": true, "unique": true, "users": "https://twitter.com/{legacy[screen_name]}", "videos": true, "likes": { "directory": ["{category}", "Likes"] }, "postprocessors":[ { "name": "metadata", "event": "post", "filename": "twitter_{author[name]}_{tweet_id}_main.json" } ] }, "unsplash": { "format": "raw" }, "vsco": { "videos": true }, "wallhaven": { "api-key": null, "metadata": false, "include": "uploads" }, "weasyl": { "api-key": null, "metadata": false }, "weibo": { "livephoto": true, "retweets": true, "videos": true }, "ytdl": { "enabled": false, "format": null, "generic": true, "logging": true, "module": null, "raw-options": null }, "zerochan": { "username": null, "password": null, "metadata": false }, "booru": { "tags": false, "notes": false } }, "downloader": { "filesize-min": null, "filesize-max": null, "mtime": true, "part": true, "part-directory": null, "progress": 3.0, "rate": null, "retries": 4, "timeout": 30.0, "verify": true, "http": { "adjust-extensions": true, "chunk-size": 32768, "headers": null }, "ytdl": { "config-file": "C:/Users/Mohammad Noor/AppData/Roaming/yt-dlp/config.txt", "format": null, "forward-cookies": true, "logging": true, "module": null, "outtmpl": null, "raw-options": null } }, "logfile": { "path": "C:/Logs/gallery-dl/logfile.txt", "mode": "a", "format": { "debug" : "[{asctime}][{levelname}] {message}", "info" : "[{asctime}][{levelname}] {message}", "warning": "[{asctime}][{levelname}] {message} [Source URL: {extractor.url}]", "error" : "[{asctime}][{levelname}] {message} [Source URL: {extractor.url}]" }, "format-date": "%Y-%m-%dT%H:%M:%S", "level": "info" }, "output": { "mode": "auto", "progress": true, "shorten": true, "ansi": false, "colors": { "success": "1;32", "skip" : "2" }, "skip": true, "log": "[{name}][{levelname}] {message}", "logfile": null, "unsupportedfile": null }, "netrc": false } ```
xion2 commented 1 year ago

This is what my Reddit/Redgifs section looks like.

"reddit": { "filename": "{title} {date:%Y-%m-%d} {id}{num:?_//}.{extension}", "comments": 0, "morecomments": false, "date-min": 0, "date-max": 253402210800, "date-format": "%Y-%m-%dT%H:%M:%S", "id-min": null, "id-max": null, "recursion": 0, "videos": "ytdl",
"parent-directory": true, "parent-metadata": "_reddit"
}, "redgifs": { "filename": { "'_reddit' in locals()": "{_reddit[title]} {_reddit[date]:%Y-%m-%d} {_reddit[id]}.{extension}" },
"format": ["hd", "sd", "gif"]

error1852 commented 1 year ago

Still doesn't work for me, unfortunately, even with your settings.

cheese529 commented 1 year ago

@error1852 are you using yt-dl? It works if you change "videos":"ytdl", to just "videos": true, Assuming this is a bug with yt-dlp looking to add author metadata but not finding anything so therefore it just skips the download, seems like a bug that @mikf would have to look into. Very unlucky because I use my yt-dlp config in tangent with gallery-dl to download reddit video thumbnails as well.

error1852 commented 1 year ago

Not to my knowledge. My full config looks like this:

{
    "extractor":
    {
        "base-directory": "./gallery-dl/",
        "parent-directory": false,
        "postprocessors": null,
        "archive": null,
        "cookies": null,
        "cookies-update": true,
        "proxy": null,
        "skip": true,

        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0",
        "retries": 4,
        "timeout": 30.0,
        "verify": true,
        "fallback": true,

        "sleep": 0,
        "sleep-request": 0,
        "sleep-extractor": 0,

        "path-restrict": "auto",
        "path-replace": "_",
        "path-remove": "\\u0000-\\u001f\\u007f",
        "path-strip": "auto",
        "path-extended": true,

        "extension-map": {
            "jpeg": "jpg",
            "jpe" : "jpg",
            "jfif": "jpg",
            "jif" : "jpg",
            "jfi" : "jpg"
        },

        "gfycat":
        {
            "filename": "{date} {title}-{id}.{extension}",
            "directory": ["{subreddit}", "{author}"],
            "format": ["mp4", "webm", "mobile", "gif"]
        },
        "imgur":
        {
            "mp4": true,
            "filename": "{date} {title}-{id}.{extension}",
            "directory": ["{subreddit}", "{author}"]
        },
        "reddit":
        {
            "comments": 0,
            "morecomments": false,
            "date-min": 0,
            "date-max": 253402210800,
            "date-format": "%Y-%m-%dT%H:%M:%S",
            "id-min": null,
            "id-max": null,
            "recursion": 0,
            "videos": true,
            "filename": "{date} {title}-{id}.{extension}",
            "directory": ["{subreddit}", "{author}"],
            "parent-directory": false,
            "parent-metadata": true,
            "blacklist": ["tumblr"]
        },
        "redgifs":
        {
            "filename": "{date} {title}-{id}.{extension}",
            "directory": ["{subreddit}", "{author}"],
            "format": ["hd", "sd", "gif"]
        }
    },

    "downloader":
    {
        "filesize-min": null,
        "filesize-max": null,
        "mtime": true,
        "part": true,
        "part-directory": null,
        "progress": 3.0,
        "rate": null,
        "retries": 4,
        "timeout": 30.0,
        "verify": true,

        "http":
        {
            "adjust-extensions": true,
            "chunk-size": 32768,
            "headers": null,
            "validate": true
        },

        "ytdl":
        {
            "format": null,
            "forward-cookies": false,
            "logging": true,
            "module": null,
            "outtmpl": null,
            "raw-options": null
        }
    },

    "output":
    {
        "mode": "auto",
        "progress": true,
        "shorten": true,
        "ansi": false,
        "colors": {
            "success": "1;32",
            "skip"   : "2"
        },
        "skip": true,
        "log": "[{name}][{levelname}] {message}",
        "logfile": null,
        "unsupportedfile": null
    },

    "netrc": false
}
error1852 commented 1 year ago

Using the default config lets me download redgifs from [deleted] users, but imgur still doesn't work.

Here is an imgur example.

cheese529 commented 1 year ago

Not to my knowledge. My full config looks like this:

alright that is very strange indeed.

Using the default config lets me download redgifs from [deleted] users, but imgur still doesn't work.

What's the difference between both configs that you think is causing this behavior? You can use notepad++ to compare both simultaneously

error1852 commented 1 year ago

It could be because I didn't have an extractor for ytdl, but that still doesn't explain why imgur won't play ball.

xion2 commented 1 year ago

That imgur example you gave doesn't work with any of my configs either. No clue why.

cheese529 commented 1 year ago

Here is an imgur example.

this photo is hosted on reddit, not imgur. if you open the link it takes you to reddit CDN

error1852 commented 1 year ago

What? The image is hosted on imgur.

Hrxn commented 1 year ago

Yes, it's hosted on Imgur. Guys please make sure that we're actually on the same page here, otherwise it's pointless to even try to reproduce anything..

error1852 commented 1 year ago

After manually testing a number of imgur links with [deleted] authors, it seems like only some of them refuse to work.

Weirdly enough, downloading them straight from imgur works just fine.

mikf commented 1 year ago

The Reddit API response for this thread does not include any external links, meaning gallery-dl does not and cannot see any imgur URL and therefore does not try to download it.

I have no idea why the API behaves like this and how to "fix" it. Logging in / Using a private OAuth access token doesn't help.

error1852 commented 1 year ago

@mikf I'm guessing there's nothing to be done on my end, so I'm closing the issue.