lllyasviel / Fooocus

Focus on prompting and generating
GNU General Public License v3.0
38.09k stars 5.09k forks source link

[Feature Request]: Adding " Clip Interrogator " image to prompts in fooocus #3012

Open badraymen opened 1 month ago

badraymen commented 1 month ago

Is there an existing issue for this?

What would your feature do?

after my experience with the fooocus "describe" tool, I found that there are missing sentences and missing words in the creation of the prompts and the sentences are too short and they are not really targeted, however I found an alternative, I looked on the internet for websites that generate prompts from an image and that was my problem because honestly I use the all prompt image option too much to create funds for my photo and I found "clip interrogation" which is an extension intended for SD XL and I tested it on "collab" and it gave magnificent results the words are well targeted with the name of the photographers and/or even the style name, it manages to recognize the brands they sometimes manage to write correctly the name of the brands on the products that I use for handling, I found that it is really practical I used it on fooocus and This gave truly incredible results; there is a great resemblance between the image that I would like to generate and the original image. so it will be really kind of you to add this functionality to focus in the form of a tab to create prompts and switch them directly into the text field for generation Link :

https://github.com/pharmapsychotic/clip-interrogator

best regards

Proposed workflow

  1. Go to "Input Image"
  2. Go to "describe"
  3. Choose Model expl: Vit-L/Openai
  4. choose fast or best
  5. put your image to describe it
  6. press generate prompt

Additional information

No response

mashb1t commented 1 month ago

@badraymen fyi I'm on it and currently testing various image captioning models in a separate project: https://github.com/mashb1t/describeiments

The intermediate result is that BLIP (1) (+ BERT) is the one with the best integration into Fooocus and lowest resource allocation, not sure if worth the switch + effort.

modified code of https://huggingface.co/spaces/pharmapsychotic/CLIP-Interrogator/blob/main/app.py can be found in interrogator.py.txt

image One can also really overshoot in terms of VRAM with the combination of ViT-H and BLIP 2.

badraymen commented 1 month ago

Thank you so much @mashb1t For your interest and your involvement, but can I have a speech understandable for a person who really knows nothing in the language of coding and paython, an explanation to simplify I would be really kind of you. So are you going to integrate clip interrogator, or are you going to develop a new function in fooocus for the next version?

mashb1t commented 1 month ago

@badraymen sure: no clip-interrogator until it has been fully evaluated and benchmarked. (also it's based on transformers, which Fooocus doesn't use)

badraymen commented 1 month ago

You are really kind dear sir, thank you again for this clarification, good luck in what you do