lllyasviel / Fooocus

Focus on prompting and generating
GNU General Public License v3.0
39.99k stars 5.52k forks source link

[Feature Request]: Implement RPG Diffusion #2683

Open xunkar opened 5 months ago

xunkar commented 5 months ago

Is there an existing issue for this?

What would your feature do?

https://github.com/YangLing0818/RPG-DiffusionMaster

Recaptioning, Planning, and Generating with Multimodal LLMs.

Allows finer control of diffusion, with a better understanding of attribute binding and regional placements of subjects, its workflow also allows the user to generate very large images.

Proposed workflow

What is interesting is how invisible it is for the user, so no new UI would be required. However a precise prompt would yield results with higher fidelity.

The only useful thing UI wise would be a checkbox in the advanced option to enable or disable the recaptionning part of the workflow. Then again, the recaptionning is not the most important feature of the project and could be skipped, although useful.

Additional information

No response

mashb1t commented 5 months ago

One would also need either a subscription to GPT-4/Gemini-Pro or a local MLLM, which would drastically increase system requirements as models have to be loaded in parallel or decrease time to generate the prompt. What is the benefit you see compared to Fooocus-V2? (modified GPT-2)

xunkar commented 5 months ago

The ability to better understand the prompt is pretty powerful in my opinion. Taking a look at the examples provided, it seems capable of understanding things such as where subjects should be placed in relation to each other and in the composition. It also seems to annihiliate attribute bleeding which is a very difficult to deal with as it is.

Recaptionning however is not something I'm attached to, I wonder if this kind of prompt injection so to speak can lead to deviation from the user's request more than help it.

tianjinghai1978 commented 5 months ago

In WildCards, you can add a custom style customization function, and you can generate a variety of styles in batches. namo amitabha.

GoldMath commented 1 week ago

There is this Omost project of lllyasviel too which kinda works that way, LLM scenery helper, seems on-hold/inactive for now; something similar would be indeed a nice Fooocus-V3 (just noticed there is a thread https://github.com/lllyasviel/Fooocus/discussions/3076).

https://github.com/lllyasviel/Omost

When Fooocus-V2 is a nice enhancer it's always feels like it's throwing random keywords stuff without context or additional scenery.