picobyte / stable-diffusion-webui-wd14-tagger

Labeling extension for Automatic1111's Web UI
539 stars 64 forks source link

An option to include all the automatically excluded tags? #66

Closed HumbleStranger closed 9 months ago

HumbleStranger commented 9 months ago

As the title says.

WD14 Tagger was always one of the most vital extensions for dataset curating, and while some(?) might appreciate this somewhat forced inclusion of automatically excluded tags, it'd be nice to let the people who prefer pruning and adjusting tags by hand have an opportunity to do so as it was during the times of the original extension. The dedicated Settings tab for the extension lacks the option to disable it.

I tried to negate this unexpected addition by adding all the excluded tags into "Keep tag" field, but it won't add all of them, probably due to the sheer amount of them(the dataset is around a few thousand images with a hefty amount of total tags).

If there's a way to disable it somehow - please let me know.

picobyte commented 9 months ago

I don't know what you are talking about. I do not share your community. Please state clearly what you'd like to change, then we can discuss whether it's feasible or not.

For instance, in your issue text you seem to dislike the 'forced inclusion of automatically excluded tags'. I don't know what that is about, and did you mean maybe 'automatic exclusion"? because your title seems to ask for an option to include those, which is a contradiction. As said, I don't know what is 'automatically excluded'.If there seems something buggy then clearly indicate what.

Also state clearly what you did. single image or batch interrogation? Give me a reproducible example, then I can work on it. Also why is everyone so lazy, besides a proper bug report can no one at least try to look at the code and fix an issue?

HumbleStranger commented 9 months ago

I don't know what you are talking about. I do not share your community. Please state clearly what you'd like to change, then we can discuss whether it's feasible or not.

For instance, in your issue text you seem to dislike the 'forced inclusion of automatically excluded tags'. I don't know what that is about, and did you mean maybe 'automatic exclusion"? because your title seems to ask for an option to include those, which is a contradiction. As said, I don't know what is 'automatically excluded'.If there seems something buggy then clearly indicate what.

Also state clearly what you did. single image or batch interrogation? Give me a reproducible example, then I can work on it. Also why is everyone so lazy, besides a proper bug report can no one at least try to look at the code and fix an issue?

My apologies if I wasn't articulating the issue correctly, English is not my first language. Also, sorry for the possibly spiteful tone of the message, there's been a roadblock at every step of working on this dataset, so I might have expressed myself the way I did due to accumulated frustration. In no way am I ungrateful.

Okay, moving on to the issue.

Basically, what I meant is to keep the excluded tags mentioned in there: image After reading through this sheet of tags multiple times just in case I was mistaken, I understood that it's a mix of both generally unneeded tags and conceptual tags that are supposed to be included but for some reason wouldn't be.

By copying all of these into the image I was able to mitigate this detrimental exclusion to an extent, but not fully.

I dug around to see what exactly was wrong and came to an interesting conclusion: single image and batch interrogations give different captions, and I have no idea why. Not in all, but in some pictures.

I can DM some examples to you. It's a NSFW-ish, so I can't send it here.

RoelKluin commented 9 months ago

I can understand it's frustrating if it's not working as expected, also I have limited time, due to my work, but I'll look into it in more detail later. Below is from the top of my head (not behind a tagger right now).

If all is well you should be able to put .* in keep tags then nothing should be filtered. Also on the excluded tags tab, on top you can choose send visible to keep tags via the buttons on the right pane, to keep all those. You can limit what is visible via the search interface on the right. The search input is not without issues but not something that I can help. just hit enter/return after editing in the input field and it should work.

What is filtered in general is based on the thresholds and what is explicitly excluded. The exclude strings are regular expressions. multiple regular expressions actually, separated with comma's. There are two sliders on the left pane and I think there was one max count listed slider in the settings tab. Maybe the latter is what is limiting for you. Also if tags have a weight below a certain value close to zero, negative or NA, by the interrogator, they are filtered.

Upon a batch query, the overview displayed is for the entire batch query, and thresholds you set may not be as you expect. Should they apply on the average? or on per image, and the result thereof averaged?

Also I'm not 100% sure that the list displayed on the right pane is the complete list of tags that end up in the tags file per image. Some may be excluded in the overview because exceeding max count (sorted on average) but a certain image may have that tags weight above thresholds and this tag may then be included for that image.

The problem was that the list on the right pane was getting huge with all reported tags for large batch queries. The interface is a trade off, meant to give an overview, and for a huge batch query, a complete overview with 60k tags is not really an option either.

If certain tags known beforehand are not desired for huge queries, it may be much easier to do the filtering by hand with scripting, using sed -i (be careful, though, backup first) on the tag results.

HumbleStranger commented 9 months ago

If all is well you should be able to put .* in keep tags then nothing should be filtered. Also on the excluded tags tab, on top you can choose send visible to keep tags via the buttons on the right pane, to keep all those. You can limit what is visible via the search interface on the right. The search input is not without issues but not something that I can help. just hit enter/return after editing in the input field and it should work.

Just "*" or ".*"?

What is filtered in general is based on the thresholds and what is explicitly excluded. The exclude strings are regular expressions. multiple regular expressions actually, separated with comma's. There are two sliders on the left pane and I think there was one max count listed slider in the settings tab. Maybe the latter is what is limiting for you. Also if tags have a weight below a certain value close to zero, negative or NA, by the interrogator, they are filtered.

Yeah, when I looked into the settings I slid all the way to the max, didn't do much difference.

Upon a batch query, the overview displayed is for the entire batch query, and thresholds you set may not be as you expect. Should they apply on the average? or on per image, and the result thereof averaged?

I have a guess that it might be on average, hence the differing results. I honestly can't think of anything else.

Also I'm not 100% sure that the list displayed on the right pane is the complete list of tags that end up in the tags file per image. Some may be excluded in the overview because exceeding max count (sorted on average) but a certain image may have that tags weight above thresholds and this tag may then be included for that image.

I have a guess related to the above guess that the total tags over there might be averaged as well, hence the crippled results.

picobyte commented 9 months ago

After reading through this sheet of tags multiple times just in case I was mistaken, I understood that it's a mix of both generally unneeded tags and conceptual tags that are supposed to be included but for some reason wouldn't be.

No, what is listed here, on this Excluded tags tab is filtered because of the active filter settings. So.. I think this was the main issue, and there is no actual bug? Should we just close this?

single image and batch interrogations give different captions, and I have no idea why

In the ui, identical output is only expected if your batch contains just one file. If you're comparing a single image interrogation in ui with what ends in the tags file, there may be some differences, but if you've disabled all filters only for tags below weight 0.005, right?

Just "" or "."?

.*: It is a regex not a glob.

Tags that end up in tags files have to be put in here, so no continue and this branch was taken. A few are added from add_tags and finally a few are again removed dependent on your Min tag fraction in batch and interrogations slider setting (set to 0.0 and it should not filter).

First continue is after if isinstance(tag, float) but should have given messages on the console. An interrogation model oddity. Also, If you expect a certain tag but the interrogation model did not return it for reasons of its own that's also not something that I can help you with. Both cases could be an issue for the model creator; I can only wish you good luck.

RoelKluin:

if tags have a weight below a certain value close to zero, negative or NA, by the interrogator, they are filtered.

This is actually only true for the listed tags in the ui, not for what is written to the tags file.

Next question is whether the if count < max_ct: branch is taken. You said you moved the slider to the max, which means that you receive the highest 500 tags & weights per image per interrogation. 500 tags per image should be sufficient, right?

Next skip can occur if not in keep_tags, and either the tag is excluded or its weight is below the weights threshold. If this filters, you did so yourself. Note that you can adjust filter settings and the second interrogation should be almost instant, because the second+ times the weights are read from database, the db.json file (you don't need a gpu or powerful cpu).

Finally there is this if data[1] != '':, non empty and this branch taken means that we actually have a tags file to write to. And please check this, that the tags files are actually updated after a re-interrogation (read from database).

So I think this should all be ok, I think it was just misinterpretation. Could have happened to anyone.