mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.28k stars 919 forks source link

[v2.0] Configuration File Changes #2203

Open mikf opened 2 years ago

mikf commented 2 years ago

For version 2.0, I would like to redo the way configuration files are structured. In most cases this change will make configuration files simpler while at the same time allowing for greater control, especially for options that can currently only be set globally.

Instead of building a hierarchy by stacking different sections into one another (extractor > pixiv > user > filename, downloader > http > retries), everything will be just one level deep. Any hierarchy will be expressed by separating names with colons. The extractor block vanishes and global extractor options go under general (or default).

For example, the old

{
    "extractor": {
        "filename": "{filename}.{extension}",
        "pixiv": {
            "filename": "foobar",
            "user": {
                "filename": "barfoo"
            }
        }
    },
    "downloader": {
        "retries": 4,
        "http": {
            "retries": -1
        }
    }
}

would become

{
    "general": {
        "filename": "{filename}.{extension}"
    },
    "pixiv": {
        "filename": "foobar"
    },
    "pixiv:user": {
        "filename": "barfoo"
    },
    "downloader": {
        "retries": 4
    },
    "downloader:http": {
        "retries": 1
    }
}

Specifying downloader, output, etc options for a specific site will be done by putting that name after the site category name separated by a ::

{
    "pixiv:downloader": {
        "rate": "1m"
    },
    "pixiv:user:downloader:http": {
        "rate"   : "200k",
        "retries": -1
    },
    "pixiv:output": {
        "logfile": "{$HOME}/gdl/pixiv.log"
    }
}

At the moment, some extractors have a basecategory (e.g. mastodon, gelbooru_v02) that can be used to specify options shared by all extractors with the same basecategory. This idea can be expanded to where extractors can have a "category"-path of any length.

In this example the options for a fule34 extractor would get taken from booru, gelbooru_v02, and rule34, combined with the blocks for the sub-category:

{
    "booru"           : { "#": "" },
    "booru:tag"       : { "#": "" },
    "gelbooru_v02"    : { "#": "" },
    "gelbooru_v02:tag": { "#": "" },
    "rule34"          : { "#": "" },
    "rule34:tag"      : { "#": "" }
}

For each of these it would be possible to set downloader, output, etc options with, for example, gelbooru_v02:tag:downloader:http.

Custom domains for basecategories can be put under instances:

{
    "foolslide:instances": {
        "otscans"  : "https://otscans.com/foolslide",
        "helvetica": "https://helveticascans.com/r"
    },
    "mastodon:instances": {
        "tabletop.social": {
            "root": "https://tabletop.social",
            "access-token": "513a36c6..."
        }
    }
}

Options for child extractors could be done by putting its category after its parent's category, and maybe by separating them with a > instead of ::

{
    "imgur": {
        "#": "general imgur options"
    }
    "reddit:imgur": {
        "#": "imgur options when spawned from a reddit extractor"
    },
    "reddit>imgur": {
        "#": "imgur options when spawned from a reddit extractor alt"
    }
}

The actual implementation should be relatively simple as it only has to build a single dict with options once, instead of traversing the config tree for each lookup like it is currently done:

def build_options(keys):
    options = {}
    for key in keys:
        opts = config.get(key)
        if opts:
            options.update(opts)
    return options

The new format is obviously not backwards compatible, but it is rather easy to check whether a config file is in the old format and issue a warning that links to a guide explaining how to transform old to new.

To a certain extend it will also be possible to automatically transform an old config file into the new format by moving every entry from extractor one level higher into the global dicr, scanning those entries for subcategory options and moving those to <category>:<subcategory>, and moving custom basecategory instances to:instances`.

Let me know what you think about this idea.

rautamiekka commented 2 years ago

What about copypasting code/text all over ? This has the massive potential to escalate into the copypasting resulting in having to use the search+replace A LOT.

Hrxn commented 2 years ago

So, the option names like "filename" etc. stay the same, as well as the option values (e.g. a conditional block for "directory")?

Then it's easy.. perfectly fine with me.

I think it's definitely an improvement, especially the alternative syntax for options for child extractors..

kattjevfel commented 2 years ago

I have no issues with the current config, but the new one doesn't seem worse at least, so as long as documentation follows I'm alright with it.

voreman567 commented 2 years ago

As long as theres a small guide for what everything means and how to use it properly that'd honestly be fine with me, seems good!

folliehiyuki commented 2 years ago

If the new format makes it easier to parse then go for it. I also don't think the new one looks bad compared to the current format.

And also maybe we should pin this issue when the format change happens. A stdout warning + documentation cannot stop people from creating issues immediately when their configs don't work.

AlttiRi commented 2 years ago

Will decomposition of the same property work?

For example, first I need a custom filename, dirname, I just add:

    "deviantart": {
        "directory": ["[gallery-dl]", "[{category}] {author[username]}"],
        "filename": "[{category}] {author[username]}—{index}—{date:%Y.%m.%d}—{title}—{filename}.{extension}"
    },

Then I need to add credentials. I don't edit the previous config, but I just add the additional one:

    "deviantart": {
        "client-id": "12345"
        "client-secret": "0123456789abcdef0123456789abcdef",
    },

Some time later I add a postprocessor:

    "deviantart": {
        "metadata": true,
        "postprocessors": [{
            "name": "metadata",
            "mode": "custom",
            "filename": "[{category}] {author[username]}—{index}—{date:%Y.%m.%d}—{title}—{filename}.html",
            "directory": "metadata",
            "extension": "html",
            "format": "<h1 style='display: inline'><a href='{url}'>{title}</a></h1> by <a href='https://www.deviantart.com/{username}'>{author[username]}</a><div><br></div><div class='content'>{description}</div><br><div><hr><div class='tags'>[\"{tags:J\", \"}\"]</div><hr></div><div>{date:%Y.%m.%d}</div><br>\n\n"
        }]
    },

It can be done with such approach:

// The result `"deviantart"` config
const deviantart = defaults.deviantart; 
for (const [key, value] of configs) {
  if (key === "deviantart") {
    Object.assign(deviantart, value);
  }
}

With such approach I can group postprocessors in one place of the file, credentials in the other one.


If I have:

    "deviantart": {
        "client-id": "12345"
        "client-secret": "0123456789abcdef0123456789abcdef",
    },
    "deviantart": {
        "client-id": "23456"
        "client-secret": "abcdef0123456789abcdef0123456789",
    },

The second is applied. (It overwrites the first one.)


UPD. In case of postprocessors it should not overwrite, but merge the arrays.