picobyte / stable-diffusion-webui-wd14-tagger

Labeling extension for Automatic1111's Web UI
540 stars 64 forks source link

Feedback and questions #38

Open a-l-e-x-d-s-9 opened 11 months ago

a-l-e-x-d-s-9 commented 11 months ago

Hi @picobyte,

Thank you for your work. I have been using the old wd14-tagger, and now I tried your version from the master branch and there are a few things I wanted to say:

  1. The append option is missing. A lot of times, I have my custom tags that I make and I want to keep them, currently the script just erases them. Would it be possible to bring back the append functionality?
  2. You moved the old configurations to the settings tab, so now they can be saved, it's good, but it also requires jumping between the tagging tab and settings, and at first I thought that you just scraped all the old configurations completely, but then I noticed them in the settings tab.
  3. I hoped to notice an improvement in the first interrogation speed, with multiple images being interrogated at the same time, but it looks like you haven't made changes in this regard. Obviously, you improved the second interrogation with db.json, but I usually just do one interrogation for a lot of files, so it's not really useful to me personally. Also, placing db.json in the folder with the images themselves is not that good solution in my opinion.
picobyte commented 11 months ago

Thank you for your interest. I'm fine with discussing but I'm not entirely sure about all points. 1.Appending to tag text files is something I'd rather not do. Appending then causes duplicates, or tags in wrong order, and then people complain about this, there would have to be code for re-reading the tags files and correcting all this, while in the end, I think this is just a file management issue. This has nothing to do with interrogation and writing the results thereof to files. All of this management for different individual purposes, it just messes up the code. You can easily do this with scripting.e.g.

# do the first interrogation
cd /path/to/tags/files
mkdir first
cp *.txt ./first/
# redo second interrogation
ls -1 *.txt | while read f; do cat first/"$f" "$f" | sed -r 's/, ?/\n/' | sort | uniq | tr "\n" "," > "tmp_$f" && mv "tmp_$f" "$f"; done

And people interrogate with many different requirements. There is no one solution.

  1. What options are those that you frequently need. Maybe if I understand the requirements for interrogation better, I can improve the ui also for this purpose.

  2. The interrogation is mostly limited by the, well.. the interrogation. It may be possible to speed up, but will require investigation, and may have constraints on what images you feed it, so it is probably not that easy. The mass tagger is an attempt, but also has some of these requirements, but also tf version, which I don't have, so it's implemented without testing, probably buggy. It will require someone to do the debugging with. Placing the db.json elsewhere makes less sense, and again this is file management.

a-l-e-x-d-s-9 commented 11 months ago

Hi @picobyte,

  1. The append option was available in the original extension, and there was an option to avoid duplication of tags: Screenshot from 2023-08-06 12-30-41 Screenshot from 2023-08-06 12-31-13 So when the interrogator adds tags to the file, already existing tags wouldn't be added again. It is very useful in many cases, specifically: a. I usually want to add first to captions style tags, names, and trigger words - in ED2 the first tag in captions isn't shuffled with the shuffling option that I use. Also, it's easier to start with clean captions, first fill out custom tags, before the interrogation with WD14 to avoid confusion. Then I would usually add the interrogation on top of that, without duplication of tags. b. In case of multiple interrogations are needed - I can start with a higher threshold and add a lower threshold later, or combine multiple models of integrations, one after another.
  2. Regarding the caption merging script, I can make a python script to combine captions without duplication and remove the original captions, but I usually have 20k+ images spread over 50 nested folders, and duplication of captions adds a huge amount of files to handle. The append option was already part of the original extension so I don't understand why would you want to remove it.
  3. The splitting of the settings - I usually adjust those options that you moved to settings, before running interrogation, so putting them in settings is making the whole process prone to forgetting to change them and is generally inconvenient.
  4. Regarding speeding up the integration, obviously it's limited by the interrogation, with the original extension it takes around 1.125 seconds per image on my system. I adjusted the code of the original integration to be multi-threaded, with 10 threads it improved the speed by around 30% - with a lot of images it's significant. I don't suggest you do something similar because it can create additional problems for some people, with little benefits. I don't understand the system enough to make a proper change, that would actually allow 10 concurrent models that would cut the interrogation time by a factor, and I'm not sure if it is possible at all. I hoped that it was an option with: WDMassTagger but when I used it, I only got slower results with larger batches, I don't know what is wrong there.
  5. Regarding db.json - in case I have a lot of nested folders with images, would you place db.json in each one, or into the root folder? It could be fine to place it by default in the root, but it would be better for a user to be able to change the path or store the db.json in a specific folder in the extension folder.
picobyte commented 11 months ago

Thanks for your reply.

Regarding point 3. I like to keep the interface clean for general purposes. If there are some general purpose options, now stashed away, then let me know. Also note there is a state extension, which enables you to store/load settings from a file. Also the wd14-tagger extension allows a presets json file that can be save/loaded for the tagger tab entries.

Regarding 4, not here, but at work I do handle large numbers of files on occasion, but more often rather large ones. Besides the interrogation there is also IO - reading from and writing to disk, and this can probably be improved, or maybe hardware, if files are store it on ssd, it will be faster. Maybe also the loading strategy, directly from uploaded archive, in memory and process them, though not implemented. That could work if tagging is the only step. Also, if on windows, there might be the virus scanner monitoring all files which slows things down.

Regarding 5. I don't really understand the problem. If they are in the way - obviously after all interrogations - just zip them away. find subdir/ -name 'db.json' -exec zip -m db_json_files.zip {} \+ the same could be an option for tags files.

Regarding point 1 and 2. Append and prepend are just a matter of order, to ignore I can understand, but copy I found a wrong term, it should be replace. Then there was the fact that there was not a comma appended in the tags file upon combination. It wasn't perfectly functional. At first I intended to keep it, but it also became a problem. It adds requirements at the end of all processing that really should have been handled before. And this is the deduplication. Then you have to go over all the files again, rewrite them just to fix that. And as mentioned, IO is slow. So, no, I'd rather not.

The sensible workflow is, was my conclusion then, as it is now: 1 The interrogation is done once and written raw to disk so it does not have to be redone, because it is the slow step.

  1. Filters, take place in memory, they take far less time, can be redone with different settings, people make errors here. And they should produce the output files in desired format.
  2. The tags files are the end product of these filtering selections.

This repository suffers from a lot of code that really should not have been accepted. Downloads for very specific setups, on the fly module imports that cause issues on the fly as well. And then you have all the specific requirements that are attempted to satisfy. All the classes of interrogators, they mostly exist to satisfy various kinds of download sources. I think the downloading should not be a part of the interrogation. It's just an install requirement. And the on the fly installs and then imports should also be handled there.

I understand that this particular change is a problem for your particular set-up, but I think deduplication of text files after they are written was just a fix for not being able to write the tags files the way you want them in the first place. So where this fits in is my step 2, If I understand correctly the issue is with the ordering, you prefer an ordering based on style, then weights based, and the first tag you want shuffled in, at random?

So if tags could be written in the keep_tags order, would that solve it? Note that these are fairly easy to populate using the search. Also I intended to include wildcards here, which would fit your styles, I guess. I did make an issue for that. I think that could be a sensible change, as would be the random placement of additional tags, though I do think those are settings tab options. Just use the state extension for settings. Should save you some looking up time as well.

so if you can use the wildcards files, something like

cd /path/to/tags
mkdir processed
# process styles in order, these are in the wildcards files
cat /path/to/list_of_wild_cards_files.txt | while read w; do
  # process tags files (you could also use gnu-parallel here)
  ls -1 *.txt | while read f; do
    of="processed/$f"
    # include the tags present in the wildcards files, except the ones already existing
    sed -r 's/, ?/\n/g' "$f" | grep -F -f "$w" | grep -v -F -f "$of" >> "$of"
  done
done
# etc:
# adding the tags randomly in "$of"
# comma concatenation, replacing the "$f" files, and cleanup

Something like that?

I think there's something wrong currently, with the tags being written, which I am investigating. It looks like exclude tags do not

a-l-e-x-d-s-9 commented 11 months ago

I'm sorry for the confusion, I mentioned tags shuffling not as something that I want, it is part of the ED2 trainer. It was just a way to explain why I start with my custom tags, and do interrogation later, and why I need to append tags. Custom tags can be names of characters and trigger words, that I add manually to captions, they are not just something that I can add automatically. I made a script with GUI that helps me with captions editing, it can add/remove/replace tags. I'm using Linux, so I don't have an antivirus, also I use a fast NVME SSD. If there is no way to make multiple interrogations in parallel, I don't think optimizations of reading/writing will be noticeable at all, the main bottleneck is 1+ sec per image of sequential interrogation. There is no real problem with storing db.json wherever you want. But I probably would prefer to keep a single DB file for the whole dataset, so I would be able to interrogate different sub-folders of the same dataset without creating db.json in every folder used for interrogation. Anyway, sorry for bothering you.

picobyte commented 11 months ago

Ok, I see there is already an external tagging. Are those weighed, then I could read them in and include them as an on-disk interrogation, otherwise I think they could either be considered as a per file add_tags; set all weights 1.0 or a fixed value, alternatively a linear scale from 1.0 to a minimum value, in order of tags.

The one central db.json did occur to me, but I I think a real database would be better. And I don't want to responsible for that or one big file. I fear file corruption. But if you really want to, it is possible to merge db.json files with a bit of magic. But the current precision of large interrogation indices + weights is not the best way. I should probably just split those, after all (note to self).

I believe it is possible to do interrogations in parallel, but this would require multiple gpus or gpu and on cpu. By default the fastests gpu is chosen as gpu:0. If you have more than one there is a gpu:1, this is mostly hard coded in interrogators.py. There may also be the option to run sd on gpu and once on cpu in parallel, and maybe even not to use --cpu all but specific cores only and that this interrogates faster. Note that you would have to run sd on multiple ports, from different directories where you installed sd particularly for gpu:0, gpu:1, cpu:1. cpu2,... But this will mean you also have to split the processing image folders, and also the memory requirements are doubled, unless it is possible to configure loading models in shared memory. But let's just start simple and try running 2.

Something like this:

class FromFileInterrogator(Interrogator):
    """ Pseudo Interrogator reading preinterrogated tags files """
    def __init__(
        self, name: str, path: os.PathLike, kind=float("NaN"), img_ext='.png'
    ) -> None:
        super().__init__(name)
        self.path = path
        self.kind = kind
        self.tags = None
        self.img_ext = img_ext

    def load(self) -> None:
        print(f'Loading {self.name} from {str(self.path)}')
        # self.path is a directory
        if not os.path.isdir(self.path):
            raise ValueError(f'{self.path} is not a directory')
        else:
            self.tags = {}
            for f in os.listdir(self.path):
                self.tags[f] = {}
                self.load_file(f)

    def load_file(self, tags_file: str) -> None:
        image_name = str(tags_file).split('/')[-1].split('.')[0] + self.img_ext
        with open(tags_file, 'r') as f:
            if self.kind == float("NaN"):
                for line in f:
                    for x in map(str.split, line.split(',')):
                        tag, val = x[1:-1].split(':')
                        self.tags[image_name][tag] = float(val)

            elif self.kind > 0:
                for line in f:
                    for x in map(str.split, line.split(',')):
                        self.tags[image_name][x] = self.kind
            else:
                for line in f:
                    lst = [map(str.split, line.split(','))]
                    minimum = -self.kind
                    decrement = (1.0 - minimum) / len(lst)
                    val = 1.0
                    for x in lst:
                        self.tags[image_name][x] = val
                        val -= decrement

    def unload(self) -> None:
        self.tags = {}

    def interrogate(
        self,
        image: Image
    ) -> Tuple[
        Dict[str, float],  # rating confidences
        Dict[str, float]  # tag confidences
    ]:
        return {}, self.tags[image.filename]
PureUnadulteratedEgo commented 9 months ago

I don't think it's their particular workflow though. Currently, this extension makes it impossible to enrich already existing tags that we download alongside any image or tags we've already manually added (as far as I know). This was very easy to do with the old extension.

One of the main strengths of such datasets is the sheer amount of images people have already tagged. Forcing people to discard those tags just to use the extensions seems counterproductive. I understand that there are some considerations that you have to take into account, like performance, but removing the possibility at all for such a common use case isn't great for users.

And the worst part about this is that you apparently can't even use the old extension in newer versions of A1111.

Thanks for the work you've done on this extension, but as it is I'll have to grab an old install of the web-ui to be able to use the tagger.

a-l-e-x-d-s-9 commented 9 months ago

@PureUnadulteratedEgo I have fixed the problems that the old extension has with the last webui version. And I added a small improvement to the speed of tagging I probably need to fork it and commit my changes...

a-l-e-x-d-s-9 commented 9 months ago

@PureUnadulteratedEgo Here is my fork: https://github.com/a-l-e-x-d-s-9/stable-diffusion-webui-wd14-tagger

picobyte commented 9 months ago

@PureUnadulteratedEgo Thanks for the feedback, and I see your point. I was indeed thinking about including this in my upcoming branch, but you say this FromFileInterrogator would not work with their workflow, why exactly? Is it just that they should go in as added tags unconditionally? I may be able to fix that. The code I wrote was just a quick and dirty brainstorm. I'm open for suggestions.

@a-l-e-x-d-s-9 OK, nice I see you do some parallel processing. I thought that would not be possible with interrogations over one gpu. Actually you have to be careful with the nr of threads when using --cpu=all, for me 2 goes ok, 4 does not; I don't think oom is the issue. And I have to make changes before it is thread safe.