mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.28k stars 919 forks source link

How to handle multiple tag keys? #2106

Open AlttiRi opened 2 years ago

AlttiRi commented 2 years ago

After looking at this topic https://github.com/mikf/gallery-dl/issues/2104 I now have a question.

For example, the post with ID 29652683 has: tags_artist = ["nikita_varb"] tags_character = ["jessica_rabbit"] tags_copyright = ["who_framed_roger_rabbit"] tags_general = ["1girl", "bare_shoulders", "between_breasts", "breasts", "clavicle", "cleavage", "clothing", "cocktail_dress", "dress", "elbow_gloves", "eyes_closed", "eyeshadow", "female", "footwear", "gloves", "hair_over_one_eye", "high_heels", "large_breasts", "lipstick", "makeup", "microphone_stand", "purple_gloves", "red_dress", "red_hair", "red_lipstick", "shoes", "solo", "strapless", "strapless_dress", "thighs"] tags_medium = ["high_resolution", "very_high_resolution", "large_filesize", "paid_reward"]

How to achieve the following:


The expected result for "artist, character, copyright, general" order: nikita_varb jessica_rabbit who_framed_roger_rabbit 1girl bare_shoulders between_breasts breasts

The expected result for "general, artist, character, copyright," order: 1girl bare_shoulders between_breasts breasts clavicle cleavage clothing cocktail_dress dress


The problem: If I use it: {tags_artist:?//J /} {tags_character:?//J /} {tags_copyright:?//J /}


JS solution:

[tags_artist, tags_character, tags_copyright, tags_general]
  .flat()
  .reduce((result, tag) => {
    if (result.length + tag.length < 100) {
      return result.length ? result + " " + tag : tag;
    }
    return result;
  }, "");

The result: nikita_varb jessica_rabbit who_framed_roger_rabbit 1girl bare_shoulders between_breasts breasts solo (+1 bonus tag was fit)

May be it makes sense to add a function for this case?

{concatTags(tags, delimiter, limit)}

It would be very easy to use.

So for me with this function the result patter will look so (all 8 tag types with 130 chars limit):

"filename": "[{category}] {id}—{date:%Y.%m.%d}—{concatTags([tags_artist, tags_character, tags_copyright, tags_studio, tags_general, tags_genre, tags_medium, tags_meta], ' ', 130)}—{md5}.{extension}"

With such result file name:

A few more examples:

AlttiRi commented 2 years ago

concatTags(tags_keys, delimiter, limit) function would be significantly more user friendly than :?//J operator. More over it will be with the additional functionality (works with multiple tags keys, and with the correct (keeps tags entirely) length limiter) which is also easy and obviously to use.

tags_keys is one tags key, or an array of tags keys, null keys (if they are not present for a work) are considered as an empty array.

mikf commented 2 years ago
  • it has a length limit, for example, 100 characters for all tags,
  • the length limiting should not cut a tag in half.

That's just not possible with standard format strings. Your suggested solution of having a function that handles this case would certainly help, but for that I'd have to implement an entire extra parser instead of being able to just rely on a convenient function provided by the stdlib.

There is a way to use and write your own Python function where you aren't constrained by format string syntax, but you have to implement everything yourself: https://github.com/mikf/gallery-dl/blob/master/docs/formatting.md#special-type-format-strings Replace your current filename format string with \fM module:function, create a module.py file, and define function to build filenames with everything Python has to offer.

If I use it: {tags_artist:?//J /} {tags_character:?//J /} {tags_character:?//J /} in this case if tags_character is missed I get double space " ",

Include the spaces as arguments for ?: {tags_artist:?/ /J /}{tags_character:?/ /J /}{tags_character:?//J /}

AlttiRi commented 2 years ago

I have thought that you parsed it manually and it was your own syntax.

Well, the other way (much simpler, I assume) it's to do a computed property with params are defined in the config:

"computedTagsLine": {
  "tags": ["tags_artist", "tags_character", "tags_copyright", "tags_studio", "tags_general", "tags_genre", "tags_medium", "tags_meta"],
  "delimiter": " ",
  "limit": 130
},
"filename": "[{category}] {id}—{date:%Y.%m.%d}—{computedTagsLine}—{md5}.{extension}"

Also I suggest to use such default parameters:

AlttiRi commented 2 years ago

With a computed property it is also possible to resolve these issues: https://github.com/mikf/gallery-dl/issues/1728 https://github.com/mikf/gallery-dl/issues/1798 if one of keys can be abnormal long in some cases.

A quick draft:

"trimmedKey": { // "autoFitKey"
   "key": "filename",
   "base": "{user}—{id}—{title[0:20]}—.{extension}", // I think it can be skipped
   "byteLimit": 50,
// "charLimit": 50,
// "minLenght": 5 // [?]
},
"filename": "{user}—{id}—{title[0:20]}—{trimmedKey}.{extension}"

How it approximately works (the simplified example):

const globalKeys = {...};
const pattern = "...";

function trimmedKey({key, base, byteLimit}) {
  const _base = base || pattern.replace("{trimmedKey}", "");
  const _truncTo = byteLimit - _base.length;
  const truncTo = _truncTo < 0 ? 0 : _truncTo;
  return globalKeys[key].slice(0, truncTo);
}
AlttiRi commented 2 years ago

One more example.

"computed": {
    "computedTagsLine": {
        "function": "concatTags"
        "tags": ["tags_artist", "tags_character", "tags_copyright", "tags_studio", "tags_general", "tags_genre", "tags_medium", "tags_meta"],
        "delimiter": " ",
        "limit": 130
    },
    "trimmedKey": {
        "function": "trimmedKey",
        "key": "filename",
        "byteLimit": 50
    }
}

I think the idea is clear.

AlttiRi commented 2 years ago

I use here this:

{tags[artist][0:3]:?/ /J /}{tags[character][0:3]:?//J /}
  1. I has the tailing space if there is no tags[character]. I can't trim it with !t.
  2. Also it's not safe to use, if some tags will be too long. I can't trim it with [0:60] twice, or with [0:120] once.
  3. I use only 2 types of tags for safety.
AlttiRi commented 2 years ago

Well, for tags only it should be easy to implement.

I don't know Python, so here is the JS demo:

const dumpJson1 = {
    "id": 29652683,
    "md5": "6ae32b1483fc6621dcbbbdb15144885d",
    "created_at": 1639478267,
    "extension": "jpg",
    "tags_artist": ["nikita_varb"],
    "tags_character": ["jessica_rabbit"],
    "tags_copyright": ["who_framed_roger_rabbit"],
    "tags_general": [
        "1girl","bare_shoulders","between_breasts","breasts","clavicle","cleavage","clothing","cocktail_dress","dress",
        "elbow_gloves","eyes_closed","eyeshadow","female","footwear","gloves","hair_over_one_eye","high_heels",
        "large_breasts","lipstick","makeup","microphone_stand","purple_gloves","red_dress","red_hair","red_lipstick",
        "shoes","solo","steam","strapless","strapless_dress","thighs"
    ],
    "tags_medium": ["high_resolution", "very_high_resolution", "large_filesize", "paid_reward"]
};

const json = dumpJson1;
// It looks that all properties are global available in extractor .py files, so:
globalThis.md5            = json.md5;
globalThis.id             = json.id;
globalThis.extension      = json.extension;
globalThis.tags_artist    = json.tags_artist;
globalThis.tags_character = json.tags_character;
globalThis.tags_copyright = json.tags_copyright;
globalThis.tags_medium    = json.tags_medium;
globalThis.tags_general   = json.tags_general;

// -------------
// Assume it's in gallery-dl.conf
const computedTagsLineSetting = {
    "tags": ["tags_artist", "tags_character", "tags_copyright", "tags_general"],
    "limit": 130,
  //"byteLimit": 130,
    "separator": " ",
};
// -------------

// Here is the computed property
Object.defineProperty(globalThis, "computedTagsLine", {
    get() {
        return computedTagsLineSetting ? getComputedTagsLine(computedTagsLineSetting) : "";
    }
});

function getComputedTagsLine({tags, limit, byteLimit, separator} = {}) {
    tags = tags || [];
    limit = limit || 100;
    separator = separator || " ";

    tags = tags.map(name => globalThis[name]);

    function length(string) {
        if (byteLimit) {
            return new TextEncoder().encode(string).length;
        } else {
            return string.length;
        }
    }

    return tags
        .flat()
        .reduce((result, tag) => {
            if (length(result) + length(separator) + length(tag) <= limit) {
                return result.length ? result + separator + tag : tag;
            }
            return result;
        }, "");
}

const filenamePatter = "${id}—${computedTagsLine}—${md5}.${extension}";
console.log(filenamePatter);

const resolvedFilename = eval("`" + filenamePatter + "`");
console.log(resolvedFilename);
console.log(resolvedFilename.length);
AlttiRi commented 2 years ago

So, I suggest just to add computedTagsLine property in config file for extractors that have tags.

computedTagsLine is an object with follow settings:


So, the real life using should look so:

        "sankaku":
        {
            "computedTagsLine": {
                "tags": ["tags_artist", "tags_character", "tags_copyright", "tags_studio", "tags_general", "tags_genre", "tags_medium", "tags_meta"],
                "limit": 130,
                "separator": " "
            },
            "directory": ["[gallery-dl]", "[{category}]"],
            "filename": "[{category}] {id}—{date:%Y.%m.%d}—{computedTagsLine}—{md5}.{extension}"
        },
AlttiRi commented 2 years ago

Also it looks possible to handle each tag individually to apply transforms from formatting.md:

computedTagsLine: {
    "...": "..."
    "formatter": "{tag}"
}

Just apply the pattern of "formatter" on the each tag while processes them.

Use cases:

JS simulation ```js const dumpJson1 = { "id": 29652683, "md5": "6ae32b1483fc6621dcbbbdb15144885d", "created_at": 1639478267, "extension": "jpg", "tags_artist": ["nikita_varb"], "tags_character": ["jessica_rabbit"], "tags_copyright": ["who_framed_roger_rabbit"], "tags_general": [ "1girl","bare_shoulders","between_breasts","breasts","clavicle","cleavage","clothing","cocktail_dress","dress", "elbow_gloves","eyes_closed","eyeshadow","female","footwear","gloves","hair_over_one_eye","high_heels", "large_breasts","lipstick","makeup","microphone_stand","purple_gloves","red_dress","red_hair","red_lipstick", "shoes","solo","steam","strapless","strapless_dress","thighs" ], "tags_medium": ["high_resolution", "very_high_resolution", "large_filesize", "paid_reward"] }; const json = dumpJson1; // It looks that all properties are global available in extractor .py files, so: globalThis.md5 = json.md5; globalThis.id = json.id; globalThis.extension = json.extension; globalThis.tags_artist = json.tags_artist; globalThis.tags_character = json.tags_character; globalThis.tags_copyright = json.tags_copyright; globalThis.tags_medium = json.tags_medium; globalThis.tags_general = json.tags_general; // ------------- // Assume it's in gallery-dl.conf const computedTagLineSetting = { "tags": ["tags_artist", "tags_character", "tags_copyright", "tags_general"], "limit": 130, //"byteLimit": 120, "separator": " ", "formatter": "${tag}" // using some formatting rules, see `formatting.md` }; // ------------- // Here is the computed property Object.defineProperty(globalThis, "computedTagLine", { get() { return computedTagLineSetting ? getComputedTagLine(computedTagLineSetting) : ""; } }); function getComputedTagLine({tags, limit, byteLimit, separator, formatter} = {}) { tags = tags || []; limit = limit || 100; separator = separator || " "; formatter = formatter || "${tag}"; tags = tags.map(name => globalThis[name]); function length(string) { if (byteLimit) { return new TextEncoder().encode(string).length; } else { return string.length; } } return tags .flat() .reduce((result, tag) => { tag = resolvePattern(formatter); function resolvePattern(pattern) { // for "Format String Syntax" "emulation" return eval("`" + pattern + "`"); } if (length(result) + length(separator) + length(tag) <= limit) { return result.length ? result + separator + tag : tag; } return result; }, ""); } const filenamePatter = "${id}—${computedTagLine}—${md5}.${extension}"; console.log(filenamePatter); const resolvedFilename = resolvePattern(filenamePatter); console.log(resolvedFilename); console.log(resolvedFilename.length); // there is no advanced string formatting in JS like it is in Python function resolvePattern(pattern) { return eval("`" + pattern + "`"); } ```
AlttiRi commented 2 years ago

Also I see that some extractors have only tag_string_*, but not an array with tags. In this case also makes sense to add param to specify how to split the string to an array, for example:

            "computedTagsLine": {
                "tags": ["tags_string_artist", "tags_string_character", "tag_string_general"],
                "splitter": " ",
                "limit": 130,
                "separator": " "
            },
AlttiRi commented 2 years ago

Additionally, the "ignored" (not the best name) field would be useful too. It shoud be an array with tags that should be ignored. With * syntax support (from glob syntax).

Also

The current --filter "any('tag' == T.lower() for T in tags)" are not user friendly, for example.

In additional to the default tags lists (artist, character, general) a priority/custom tag lists, for example:

"tags_important": ["third-party_edit"]