lllyasviel / stable-diffusion-webui-forge

GNU Affero General Public License v3.0
8.33k stars 811 forks source link

Long text in prompt breaks Flux dev #1830

Closed Volnovik closed 3 weeks ago

Volnovik commented 1 month ago

I tried creating an image and just could not get text right. This made me want to compare text capabilities of Flux dev model, so I got to ideaogram and took first nice looking prompt with long text: https://ideogram.ai/g/q5QR1-OGRCOrGj9e7nFT-g/0 Prompt is: A stunning and vibrant 3D illustration of a wall art piece, featuring the word 'FOCUS' in eye-catching, colorful, and uniquely designed crocheted letters. Each letter is in a different vibrant colour. The bold white text below the main word reads, "ON YOUR DREAMS AND THEY WILL BECOME YOUR REALITY." The artwork is adorned with an array of decorative elements, including flowers, iridescent bubbles, and butterflies, all set against a gradient background that transitions from a light pinkish hue at the top to a deeper purple at the bottom. The scene is set against a plain gray floor and a white ceiling with recessed lights, creating a captivating and inspiring visual experience., photo, illustration, 3d render, typography Pasted it in my setup with full flux.dev and t5xxlfp16. Result had me question stuff: 00327-2345183254

I could no believe Flux is THAT bad with text so headed to online generaion service to check pro and it did well. Made me want to check dev model there and it also performed well. Space on huggingface - also amazing image. Turned back to tinkering with samplers, schedule types etc - nothing worked. Then I reduced text in prompt to: A stunning and vibrant 3D illustration of a wall art piece, featuring the word 'FOCUS' in eye-catching, colorful, and uniquely designed crocheted letters. Each letter is in a different vibrant colour. The bold white text below the main word reads, "**ON YOUR DREAMS**" The artwork is adorned with an array of decorative elements, including flowers, iridescent bubbles, and butterflies, all set against a gradient background that transitions from a light pinkish hue at the top to a deeper purple at the bottom. The scene is set against a plain gray floor and a white ceiling with recessed lights, creating a captivating and inspiring visual experience., photo, illustration, 3d render, typography And it worked just fine: 00331-1825267790 No need to be an expert to see difference.

Installed comfy with swarmui to check if my pc has some issues and no, in comfy everything is fine: 0027-A stunning and vibrant 3D illustration o-flux_dev-690774450 Moreover, previous prompt that started this all and gave good image with wrong text generated proper text on second try: 0030-A high-quality image of a muscular bald-flux_dev-1041764079

I tried reloading Forge, tried Q8 gguf etc, no good. My flux related parameters are: 01-2024-09-16 004656 Forge is used inside StabilityMatrix, same for Swarmui install. GPU is 4090. Updated last on 15.09.24 No Loras or extensions enabled. Also noticed that with ideogram prompt generation time was longer. No errors in console, but in case of prompt from ideogram it was significantly slower that usual, 1.66s/it vs 1.3 it/s on other prompts.

So to summarize:

Juqowel commented 1 month ago

No, it's not about the prompt length. BUG in the part that is written in capital letters. Same first prompt, but lowercase "on your dreams and they will become your reality":

00003-1780770663

Volnovik commented 1 month ago

Not prompt length but length of text inside prompt. Works fine with upper case and less text. But Juqowel is also right, if you make all text lower case I seem to get better text (anecdotal, 9 generations, no gibberish, but missing words sometimes). Also, if I replace second text line with gibberish in upper case - it generates the image and is not breaking it: "WEB WT ER WEQWERTQWERTR ERTWERT." 00020-4115921302

Juqowel commented 1 month ago

Okay, I forgot about that.. It cause by "AND" keyword. It should be: "ON YOUR DREAMS A\ND THEY WILL BECOME YOUR REALITY."

00007-1780770663

Volnovik commented 1 month ago

Update: As Demian Dei from discord pointed this is related to composable diffusion feature of A1111. AND is detected as special keyword and slaughters image. BREAK also does not work as intended and seems to just null everything after. Guess we need a toggle to disable all that preprocessing for flux.