jhc13 / taggui

Tag manager and captioner for image datasets
GNU General Public License v3.0
507 stars 27 forks source link

alternative interrogate backend #166

Open yggdrasil75 opened 1 month ago

yggdrasil75 commented 1 month ago

my proposal here is: have 3 options for interrogation: wd1.4/deepdanbooru, local hf (ie: blip instruct), and remote (kobold, ooga, similar)

with wd1.4 have a list of the models available and known to be supported, mark them with a if they are not downloaded and the size in parenthesis ie: SmilingWolf/wd-v1-4-moat-tagger-v2 (327MB) allow users to download their own (or train with modelzoo) and put them in a folder based on which type (ie: wd folder, blip folder, and deepdanbooru folder), read from those folders just like you would from the hf cache

local hf/instruct models should be marked similarly to wd14 models, this just allows a separate list so you know a bit more what to expect.

remote is different: the secondary list instead of showing models shows backends. you select the backend and everything else is handled like local hf/instruct, but sent via api to the backend. probably could just do 1 backend style though (chatgpt) and allow users to set a url (ie: localhost:5001) since ooba, kobold.cpp, most backends now support openai api requests.

other benefit: no need to manually add new models to the list, the users can just download the models and put it in the folder or launch their own backend as needed.

jhc13 commented 1 month ago

have a list of the models available and known to be supported, mark them with a if they are not downloaded and the size in parenthesis ie: SmilingWolf/wd-v1-4-moat-tagger-v2 (327MB) allow users to download their own (or train with modelzoo) and put them in a folder based on which type (ie: wd folder, blip folder, and deepdanbooru folder), read from those folders just like you would from the hf cache

This is a separate issue that is also related to #97 and #161. I am considering doing an overhaul of the model download system at some point in the future to solve most of these issues, especially with the huggingface_hub update that will make this easier.

local hf/instruct models should be marked similarly to wd14 models, this just allows a separate list so you know a bit more what to expect.

I initially tried making a separate tab for the WD 1.4 models, but I decided that integrating them into the existing auto-captioning workflow was simpler and resulted in less duplicate code. I don't think the current setup is that confusing.

remote is different: the secondary list instead of showing models shows backends. you select the backend and everything else is handled like local hf/instruct, but sent via api to the backend. probably could just do 1 backend style though (chatgpt) and allow users to set a url (ie: localhost:5001) since ooba, kobold.cpp, most backends now support openai api requests.

If we do add support for remote models, it would have to be just the OpenAI API (and other compatible backends), as you mentioned. One problem is that the OpenAI API does not support all of the generation parameters that TagGUI has, like the number of beams and "Discourage from caption".

I am also unsure just how compatible the compatible backends are. If some of them require individual support it could get annoying.

yggdrasil75 commented 1 month ago

yeah, kobold's primary api supports it, as well as probably most others, its just that they all have a "torn down" version for closedai v1. I know that kobold works well with closedai v1 api, I use it for sillytavern all the time with sillytavern set to openai chat completion.

as for "confusing" I dont think its confusing, I just wish that the lists were separated because I dont want to scroll through hf/transformers models to find the wd14 models (and vice-versa). if support for arbitrary local wd14 models and arbitrary local hf/transformers, then my list would be around 50 items vs 2 lists of 10 items (local llm models) and 40 items (tagging models)

the backend list could just be the default api path (ie: api.openai.com/v1, localhost:5001/v1, localhost:8000, etc) but they all actually use openai api for simplicity sake. almost all locally run llm tools have openai v1 api with all/most of its features at the same api path.

only thing I would say is that you may want to have "limit to pixel count" as an option because if I select all and accidentally send an image that is over 4096x4096, then it would consume far too many tokens in the prompt and feeding the text in would be pointless. https://platform.openai.com/docs/guides/vision documentation if you want to check it.

geroldmeisinger commented 1 month ago

This is a separate issue that is also related to https://github.com/jhc13/taggui/issues/97 and https://github.com/jhc13/taggui/issues/161. I am considering doing an overhaul of the model download system at some point in the future to solve most of these issues, especially with the huggingface_hub update that will make this easier.

if you do, please keep this in mind: #174 (easy option to select a PR of a model which provides .safetensor instead of .py)