[Feature Request]: Detect mask segmentation by prompt text generate images.

vlinh128 commented 3 months ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do?

Version 2.5.0 includes a great mask segmentation feature, but currently, I need to go through two steps to generate new images for the "inpaint outpaint" model. The first step requires a "prompt_mask_text" to detect mask segmentations, and then the second step requires a "prompt_image_text" to generate the images. I wonder if it's possible to use just the "prompt_image_text" for both steps?

Proposed workflow

Detect key mask segmentations in the input text.
Use the identified key to detect segmentation in the original image.
Generate new images with the detected mask segmentation and the original image.

Additional information

No response

mashb1t commented 3 months ago

No, that doesn't make a lot of sense. One normally wants to replace something eith another thing, so if you only would provide the AI the target prompt it could not select what to detect properly. What is your use ase for this and why do you think this would be beneficial to have?

https://github.com/lllyasviel/Fooocus/discussions/3345#discussioncomment-10122908

vlinh128 commented 3 months ago

Thank you for your quick response!
I agree that having only one target prompt is more likely to result in incorrect substitutions. Still, it would be great to have a detect mask segmentation layer based on the target prompt, like chatGPT-4o, if the image has a person wearing a t-shirt, and jeans, the prompt is "wear a red dress", then this layer can automatically select a t-shirt and jeans for replacement.

mashb1t commented 3 months ago

@vlinh128 Fooocus doesn't support natural language input prompts, which is why "wear a dress" will not work as there is no reference which parts of the image have to be changed. This currently can't be done.

lllyasviel / Fooocus