Open badraymen opened 1 month ago
@badraymen fyi I'm on it and currently testing various image captioning models in a separate project: https://github.com/mashb1t/describeiments
The intermediate result is that BLIP (1) (+ BERT) is the one with the best integration into Fooocus and lowest resource allocation, not sure if worth the switch + effort.
modified code of https://huggingface.co/spaces/pharmapsychotic/CLIP-Interrogator/blob/main/app.py can be found in interrogator.py.txt
One can also really overshoot in terms of VRAM with the combination of ViT-H and BLIP 2.
Thank you so much @mashb1t For your interest and your involvement, but can I have a speech understandable for a person who really knows nothing in the language of coding and paython, an explanation to simplify I would be really kind of you. So are you going to integrate clip interrogator, or are you going to develop a new function in fooocus for the next version?
@badraymen sure: no clip-interrogator until it has been fully evaluated and benchmarked. (also it's based on transformers, which Fooocus doesn't use)
You are really kind dear sir, thank you again for this clarification, good luck in what you do
Is there an existing issue for this?
What would your feature do?
after my experience with the fooocus "describe" tool, I found that there are missing sentences and missing words in the creation of the prompts and the sentences are too short and they are not really targeted, however I found an alternative, I looked on the internet for websites that generate prompts from an image and that was my problem because honestly I use the all prompt image option too much to create funds for my photo and I found "clip interrogation" which is an extension intended for SD XL and I tested it on "collab" and it gave magnificent results the words are well targeted with the name of the photographers and/or even the style name, it manages to recognize the brands they sometimes manage to write correctly the name of the brands on the products that I use for handling, I found that it is really practical I used it on fooocus and This gave truly incredible results; there is a great resemblance between the image that I would like to generate and the original image. so it will be really kind of you to add this functionality to focus in the form of a tab to create prompts and switch them directly into the text field for generation Link :
https://github.com/pharmapsychotic/clip-interrogator
best regards
Proposed workflow
Additional information
No response