mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.38k stars 929 forks source link

Mangadex - possible to save metadata outside zip? #4331

Open diamondsw opened 1 year ago

diamondsw commented 1 year ago

When saving metadata, the documentation states that it's saved relative to the image download directory. However, the zip postprocessor uses the download directory as its filename and then (default) deletes the source file. This leaves two alternatives that I see:

  1. Leave the folder there with the lone metadata file in it (mychapter/metadata.json and mychapter.cbz)
  2. Add the metadata file into the archive with the "files:" option (clean, but metadata is relatively inaccessible)

Neither of these works exactly; (2) means there's no way other tools can consume the CBZ files as-is and access the metadata, so importing into viewing systems is problematic to impossible. (1) leaves useless folders everywhere. Ideally the metadata could be saved aside the cbz (mychapter.json and mychapter.cbz), but I see no way to save metadata outside the image download directory (i.e., the directory the zip/cbz will end up in), or use the same volume/chapter metadata in the filename. There may be a way to do this via exec, but given the fragility of using variables (sometimes they work, and sometimes they don't), I'm not optimistic.

KonoromiHimaries commented 1 year ago
gallery-dl --write-info-json --exec-after "cd {} && zip 0.cbz *.jpg && rm *.jpg" <url>
a84r7a3rga76fg commented 1 year ago

@KonoromiHimaries Sorry for the off-topic, how do you use --write-info-json as a configuration option?

diamondsw commented 1 year ago

@KonoromiHimaries Sorry for the off-topic, how do you use --write-info-json as a configuration option?

I believe the equivalent would be this:

        {
            "name": "metadata",
            "mode": "json",
            "directory": ["metadata"],
            "filename": "{id}.json",
            "skip": true,
            "event": "post"
        }

The key bits are the mode and filename, and then you have to add this to either the extractor's postprocessor block, or as its own named block and call it from there. You can set the directory and filename to what you want; the directory here is relative to the download directory. The above example is from a booru downloader, but should be similar.

Also, the metadata postprocessor does not inherit the "skip" option from downloads. So you'll be merrily skipping all the files already downloaded (as you'd expect) - and redownloading the metadata for all of them every time. It kind of makes sense - you might want to redownload metadata sometimes in case it changes - but it was killing my backup jobs until I found this and added "skip" directly to the metadata postprocessor.

diamondsw commented 1 year ago

gallery-dl --write-info-json --exec-after "cd {} && zip 0.cbz .jpg && rm .jpg"

I eventually got it to work. For anyone interested:

{
    "extractor":
    {
        "skip": true,
        "path-restrict": "windows",
        "path-strip": "windows",
        "sleep": 0,
        "user-agent": "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0",
        "base-directory": "~/Downloads/Dante/",

        "mangadex": {
            "metadata": true,
            "directory": ["{category}", "[{author[0]}] {manga}", "v{volume:0>2} c{chapter:0>3} - {title}"],
            "archive": "~/Downloads/Dante/{category}/[{author[0]}] {manga}/archive.db",
            "lang": "en",
            "postprocessors": [
                {
                    "name" : "metadata",
                    "event": "post",
                    "skip": true,
                    "filename": "v{volume:0>2} c{chapter:0>3} - {title}.json"
                },
                {
                    "name": "exec",
                    "event": "post",
                    "command": ["mv", "{_directory}v{volume:0>2} c{chapter:0>3} - {title}.json", "{_directory}../"]
                },
                {
                    "name": "zip",
                    "extension": "cbz",
                    "keep-files": false,
                    "skip": true
                }
            ]
        }
    },

    "output":
    {
        "skip": false
    }
}

Variable substitution and documentation problems encountered just while trying to get this to work:

Doesn't get into the usual problems with base-directory being ill-defined, not usable in substitution, not affecting the working directory, or any of the substitution and path issues affecting archives. Variable substitution is inconsistently performed (sometimes inconsistently in the same command), inconsistently supported, poorly documented, and will sometimes be allowed to spew gibberish into the filesystem that it doesn't support. And this is all fine apparently and needs no fixing, because none of these are bugs. Obviously.

gallery-dl does a truly heroic job of dealing with an ever-changing and hostile landscape of sites - I can't imagine the amount of time and effort is required. But it's let down by an inconsistent, buggy, and poorly-documented backend.

a84r7a3rga76fg commented 1 year ago

@diamondsw That worked, thank you

diamondsw commented 1 year ago

Still somewhat broken; try downloading "https://mangadex.org/chapter/413b6086-e951-4031-837a-cb2f5b3d83c7" with the above config. It will be broken - it won't move the JSON files because it can't path-restrict the variable substitution (honestly understandable in this case since it has no idea the variable is being used as a path). However, you also can't work around it with {_filename} because inexplicably the variable has a "/" at the end of a filename.