mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.46k stars 939 forks source link

tumblr [downloader.http][warning] '404 Not Found' #5565

Open openDef opened 4 months ago

openDef commented 4 months ago

Good afternoon, dear developer! Please help me solve this problem I apologize in advance for the grammar (automatic translation) When downloading from tumblr, very often the download gets up for quite a long period of time , then the download continues until the next error. The wrong address opens normally in the browser If you restart the download, the error is repeated with another link Extremely rare, but the download can go without errors That's what it shows

[downloader.http][warning] '404 Not Found' for 'https://64.media.tumblr.com/561d741270e97b2f944093eda765f06b/720934f95e54bd55-5f/s99999x99999/7dc8a9a2aa7debaafa8faf365d06a050d34348ad.png' [download][info] Trying fallback URL #1

openDef commented 4 months ago

I also noticed this error , which is extremely rare

[tumblr][warning] Unable to fetch higher-resolution version of https://64.media.tumblr.com/9dfa1ca50184eaa057b8ebf44837f62e/ec050682828daa53-14/s99999x99999/e8456d7273a15107ee4943b37cdc603a29ab23e1.jpg (734239171816898560) [download][error] Failed to download e8456d7273a15107ee4943b37cdc603a29ab23e1.jpg

Hrxn commented 4 months ago

@openDef

Well, https://github.com/mikf/gallery-dl/issues/2957 was closed for a reason, because the problem has been solved.

Sometimes there may be errors with tumblr, but only occasionally, and they are very rare, depending on how their servers handle the requests made by gallery-dl with the URL modifications. There is nothing on gallery-dl's side that can be done about this, fundamentally. Or, to put it in another way, gallery-dl has already everything to mitigate this problem as best as possible.

Here's what you should be doing:

  1. A continuous "logfile" ("mode": "a") so that you always keep track of any error that could possibly occur.

    {
    "output":
    {
        "log": {
            "level": "info",
            "format-date": "%Y-%m-%dT%H:%M:%S",
            "format": {
                "debug"  : "\u001b[0;37mDebug  :  {name} -> {message}\u001b[0m",
                "info"   : "\u001b[1;37mInfo   :  {name} -> {message}\u001b[0m",
                "warning": "\u001b[1;33mWarning:  {name} -> {message} {extractor.url:?[/]/}\u001b[0m",
                "error"  : "\u001b[1;31mError  :  {name} -> {message} {extractor.url:?[/]/}\u001b[0m"
            }
        },
    
        "logfile": {
            "path": "D:\\gallery-dl\\gallery-dl.log.txt",
            "mode": "a",
            "format": {
                "debug"  : "[{asctime}][{levelname}] {message}",
                "info"   : "[{asctime}][{levelname}] {message}",
                "warning": "[{asctime}][{levelname}] {message} [Source URL: {extractor.url}]",
                "error"  : "[{asctime}][{levelname}] {message} [Source URL: {extractor.url}]"
            },
            "format-date": "%Y-%m-%dT%H:%M:%S",
            "level": "info"
        }
    }
    }
  2. Use the correct extractor options for tumblr. Here's what I am using (except for the tokens, obviously)

    {
    "extractor":
    {
        "tumblr":
        {
            "avatar": false,
            "external": true,
            "inline": true,
            "original": true,
            "ratelimit": "wait",
            "reblogs": true,
            "posts": "all",
            "fallback-delay": 90.0,
            "fallback-retries": 6,
    
            "retries": 24,
            "skip": "abort:12",
            "sleep-request": [0.2, 0.6],
            "sleep-extractor": [0.4, 2.0],
            "blacklist": ["twitter", "instagram", "flickr"]
        }
    }
    }

Using "original": true is kind of optional, but it prevents downloading lower-res as a fallback, so that you can avoid cleaning that up, later, should you not want to keep them.

The "fallback-delay" and "fallback-retries" options are important here for the URL substitution trick, because the source of these errors is either tumblr's CDN not giving us the response that we want, or other intermittent network issues, so simply raise retries and delay to deal with this.

Should their servers still act up after all waiting and retrying, we now have the log that we can use.

[yyyy-mm-ddTHH:MM:SS][warning] Unable to fetch higher-resolution version of https://64.media.tumblr.com/<...>  (<post_ID>) [Source URL: <blog_URL>]

You can reconstruct the post URL, by grabbing the <post_ID> and the <blog_URL> and putting them together like this: <blog_URL>/post/<post_ID>

Once you have a bunch of these, simply feed them again to gallery-dl

openDef commented 4 months ago

Thanks a lot for the reply! And because of my level of development, I hardly realize your answer)) But it looks beautiful) Here's a config I've built Is this how this whole structure ( config.json )should look like?

    {
    "output":
    {
   "log": {
       "level": "info",
       "format-date": "%Y-%m-%dT%H:%M:%S",
       "format": {
           "debug"  : "\u001b[0;37mDebug  :  {name} -> {message}\u001b[0m",
           "info"   : "\u001b[1;37mInfo   :  {name} -> {message}\u001b[0m",
           "warning": "\u001b[1;33mWarning:  {name} -> {message} {extractor.url:?[/]/}\u001b[0m",
           "error"  : "\u001b[1;31mError  :  {name} -> {message} {extractor.url:?[/]/}\u001b[0m"
       }
   },

   "logfile": {
       "path": "D:\\gallery-dl\\gallery-dl.log.txt",
       "mode": "a",
       "format": {
           "debug"  : "[{asctime}][{levelname}] {message}",
           "info"   : "[{asctime}][{levelname}] {message}",
           "warning": "[{asctime}][{levelname}] {message} [Source URL: {extractor.url}]",
           "error"  : "[{asctime}][{levelname}] {message} [Source URL: {extractor.url}]"
       },
       "format-date": "%Y-%m-%dT%H:%M:%S",
       "level": "info"
       }
       }
        }
       {
         "extractor": {
         "tumblr":
    { 
"avatar": false,
        "external": true,
        "inline": true,
        "original": true,
        "ratelimit": "wait",
        "reblogs": true,
        "posts": "all",
        "fallback-delay": 90.0,
        "fallback-retries": 6,

        "retries": 24,
        "skip": "abort:12",
        "sleep-request": [0.2, 0.6],
        "sleep-extractor": [0.4, 2.0],
        "blacklist": ["twitter", "instagram", "flickr"] 
"api-key": "6--------------------------------------c",
"api-secret": "5----------------------------------------X",
"filename": "{filename}.{extension}",       
"image-filter": "extension not in ('m4v', 'gif', 'mp3', 'webm', 'avi', 'mp4', 'mkv', '')"
 }

 }
 }
openDef commented 4 months ago

I definitely made a mistake somewhere. I don't understand where

openDef commented 4 months ago

@openDef Quotes and commas defeated me and I couldn't put everything right to make it work

Hrxn commented 4 months ago

Here is a full config for tumblr, just change values as you need:

{
    "extractor":
    {
        "base-directory": "D:\\Home\\Downloads\\",
        "archive": "D:\\Home\\Apps\\gallery-dl\\gallery-dl.archive.global.db",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
        "skip": true,

        "keywords": {"category": ""},
        "keywords-default": "",
        "parent-directory": true,
        "extension-map":
        {
            "jpeg": "jpg",
            "jpe" : "jpg"
        },

        "tumblr":
        {
            "user":
            {
                "directory": {
                    "locals().get('category')": ["Tumblr", "Blogs", "{category}", "{blog_name!c}"],
                    ""                        : ["Tumblr", "Blogs", "Unsorted", "{blog_name!c}"]
                },
                "filename": {
                    "locals().get('slug')": "{date:%Y-%m-%d-%H%M%S}.{id}.{num:>02}.{slug:R.//}.{extension}",
                    ""                    : "{date:%Y-%m-%d-%H%M%S}.{id}.{num:>02}.{extension}"
                }
            },
            "post":
            {
                "directory": {
                    "locals().get('category')": ["Tumblr", "Posts", "{category}"],
                    ""                        : ["Tumblr", "Posts", "Unsorted"]
                },
                "filename": {
                    "locals().get('slug')": "{date:%Y-%m-%d-%H%M%S}.{blog_name}.{id}.{num:>02}.{slug:R.//}.{extension}",
                    ""                    : "{date:%Y-%m-%d-%H%M%S}.{blog_name}.{id}.{num:>02}.{extension}"
                }
            },

            "archive-prefix": "",
            "archive": "D:\\Home\\Apps\\gallery-dl\\gallery-dl.archive.tumblr.db",
            "avatar": false,
            "external": true,
            "inline": true,
            "original": true,
            "ratelimit": "wait",
            "reblogs": true,
            "posts": "all",
            "fallback-delay": 90.0,
            "fallback-retries": 6,

            "retries": 24,
            "skip": "abort:12",
            "sleep-request": [0.2, 0.6],
            "sleep-extractor": [0.4, 2.0],
            "blacklist": ["twitter", "instagram", "flickr"],

            "api-key": "  ----  ",
            "api-secret": "  ----  ",
            "access-token": "  ----  ",
            "access-token-secret": "  ----  "
        }
    },
    "output":
    {
        "mode": "color",
        "ansi": true,
        "shorten": "eaw",
        "skip": true,
        "progress": true,
        "log": {
            "level": "info",
            "format-date": "%Y-%m-%dT%H:%M:%S",
            "format": {
                "debug"  : "\u001b[0;37mDebug  :  {name} -> {message}\u001b[0m",
                "info"   : "\u001b[1;37mInfo   :  {name} -> {message}\u001b[0m",
                "warning": "\u001b[1;33mWarning:  {name} -> {message} {extractor.url:?[/]/}\u001b[0m",
                "error"  : "\u001b[1;31mError  :  {name} -> {message} {extractor.url:?[/]/}\u001b[0m"
            }
        },
        "logfile": {
            "path": "D:\\Home\\Apps\\gallery-dl\\gallery-dl.log.txt",
            "mode": "a",
            "format": {
                "debug"  : "[{asctime}][{levelname}] {message}",
                "info"   : "[{asctime}][{levelname}] {message}",
                "warning": "[{asctime}][{levelname}] {message} [Source URL: {extractor.url}]",
                "error"  : "[{asctime}][{levelname}] {message} [Source URL: {extractor.url}]"
            },
            "format-date": "%Y-%m-%dT%H:%M:%S",
            "level": "info"
        }
    }
}
openDef commented 4 months ago

Thank you so much for your help Health and prosperity to you!