mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.4k stars 931 forks source link

--range seems to be ignored on reddit #2847

Closed alexbuisse closed 2 years ago

alexbuisse commented 2 years ago

Hi, I am running the following command for a variety of subreddits:

gallery-dl --filter "extension in ('gif','mp4')" --range 1-150 --download-archive "sqlite3-gif/unexpected-gif.sqlite3" -A 3 -o directory="x" "http://www.reddit.com/r/unexpected"

The download proceeds as expected, but does not stop after the first 150 items. Depending on the subreddit, it will download anything from 500 to 800 items.

In a possibly related issue, it also seems to ignore asking imgur/redgifs config options to download gif instead of default mp4. Here is my config file:

{
  "extractor": {
    "gfycat": {
      "format": "mp4"
    },
      "imgur": {
        "mp4": true
      },
      "redgifs": {
        "format": "mp4"
      }
  },
    "downloader": {
      "retries": 3,
      "timeout": 2.5,
      "filesize-max": "25M",
      "http-adjust-extensions": true
    }
}

Have I missed something obvious?

alexbuisse commented 2 years ago

I just realized that I pasted an older version of the config file, the one requesting gif is actually:

{
  "extractor": {
    "gfycat": {
      "format": "gif"
    },
      "imgur": {
        "mp4": false
      },
      "redgifs": {
        "format": "gif"
      }
  },
    "downloader": {
      "retries": 3,
      "timeout": 2.5,
      "filesize-max": "25M",
      "http-adjust-extensions": true
    }
}
mikf commented 2 years ago

You missed something, but it is not all that obvious.

--range only applies to file URLs coming from the site itself, in your case reddit, whereas external URLs (gfycat, redgifs, etc) get handled by the oddly named --chapter-range.

It is not possible to combine those two counters, so with --range 1-150 --chapter-range 1-150 you could download up to 150 reddit videos as well as files from 150 external URLs before the next one would cause a stop.

In a possibly related issue, it also seems to ignore asking imgur/redgifs config options to download gif instead of default mp4.

If an animation is not available as gif, you get the next best format, which is usually mp4.

For gfycat and redgifs, you can set the format to ["gif"] (instead of just "gif") and it will only download files if they are available as gif

-A 3

For reddit, you should use -T 3 and enable parent-skip to better handle files from external URLs.

alexbuisse commented 2 years ago

Thanks for the detailed answer, that did indeed solve things.