mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
12.08k stars 983 forks source link

Better tag handling #1141

Open TestPolygon opened 4 years ago

TestPolygon commented 4 years ago

I think that

The expected usage: {tags:artist,character,general:120} or {tags:120:artist,character,general}.

For example: If {tags} is batman (series), the dark knight, dc comics, batman, bruce wayne, the joker, heath ledger, bonocho, sketch, copyright name, 2boys, bats, black bodysuit, bodysuit, brown eyes, closed mouth, collared shirt, expressionless, facepaint, glasgow smile, male, male focus, multiple boys, outline, outstretched arm, parted lips, pink shirt, shirt, smile, upper body, white outline, wing collar (such tags line is too long and it will break the file saving)

then {tags:40} should return batman (series), the dark knight, not batman (series), the dark knight, dc com

and (if the tags categories functional is supported) {tags:artist,character,general:40} should return bonocho, batman, bruce wayne, the joker

Also I don't like that some extractors return tags with spaces, seperated with comma. I want to use the tags line seperated with spaces, with replaced spaces in tags by a underline.

With this option the expected output with {tags:artist,character,general:40} template should be bonocho batman bruce_wayne the_joker.

mikf commented 4 years ago

tags should be separated by categories if it possible. (copyright, artist, character, medium, general, meta, genre...)

There's already a tags option, which (kind of) does this.

it's should be possible to specify the tags line format (seperate tags with a comma or a space)

If tags are given as a list (and not a string, like they are for most *booru sites), you can use {tags:J<sep>}

TestPolygon commented 3 years ago

It says:

Note: This requires 1 additional HTTP request for each post.

Does it do requests only if it really needed? For example, for sankaku it's possible to download an image only from the "view" page. So it should not do any additional request.

For rule34 it possible to cache the categories of tags, so if the image tags already known tags it should not do the request to figure out which category the tag is of.

And I have tested it. For the first run this caching is not so effective, but anyway it decrease the count of requests. For some artist tag with 1219 images with the tag category caching it was enough to do only 764 requests (the result: 3108 tags, 5 types). And it was the first run (the cache was empty).

TestPolygon commented 3 years ago

The tags do not take a lot of space and tag categories are persistent so I think it would be nice if the program would cache them.

Butterfly-Dragon commented 3 years ago

i don't know if this is unresolved but i was about to open a request and it seems like this is what i was about to request. So... when downloading the page where the art/video/whatever is stored we do get tag types by (most of) the various *booru sites image in this case i can clearly see "tag-type-copyright" (example) in the class types of each link. In case this is not what you were meaning with "http request" then i apologize about making you lose time.

EDIT: Nevermind i read the "tags" part of the documentation only after writing.

thatfuckingbird commented 3 years ago

The additional request (at least in the case of gelbooru) comes from needing to get the HTML at all. gallery-dl uses their API otherwise, but for some braindead reason, tag types are not served over the API (even though tags are).

Butterfly-Dragon commented 3 years ago

Yes, as i said in the edit... i read the docs after writing the above. Sorry about my useless post.