invoke-ai / InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
22.88k stars 2.37k forks source link

[enhancement]: Embiggen questionable mechanic!? #1498

Closed Neosettler closed 8 months ago

Neosettler commented 1 year ago

Is there an existing issue for this?

Contact Details

No response

What should this feature add?

It seems that the Embiggen mechanic as been misused or it is not really practical. The current feature does:

required to: generate an txt2image image A from prompt (this step should not be necessary)

then Embiggen will: generate a scaled img2img image B base on A If 2x, it will generate 4 img2img images Cn (like it would for a 2x2 grid) and blend the overlap portion with B

While it can produce cool results, this technic is merely a collage and it feels gimmicky imho. Arcana-2022-11-13-233915-2592304768

What it should be doing is:

generate a txt2image image A (generative prompt) upscaling A image to generate B generate img2img Cn based B + (detail prompt) Blend Cn with B (options should also be available for how the blend is controlled, eg, photoshop blending mode where blend none = Cn completely override B)

https://github.com/jquesnelle/txt2imghd/blob/master/gallery/00005ud.png 00005ud

Important: Can't use the same prompt to generate C tiles as the original prompt. "Detail" prompt may be something like an art style applied to B.

Alternatives

https://github.com/jquesnelle/txt2imghd

Aditional Content

No response

Neosettler commented 1 year ago

Also, overlap 0 is not supported and exits the invoke.py script

whosawhatsis commented 1 year ago

Have you tried lowering the embiggen_strength value? Your first image looks like what you would get when that is set way too high.

Neosettler commented 1 year ago

whosawhatsis yes of course, the point I'm making here is that the Invoke way is, in this case, 4 images of the same prompt stitch together. The real deal is to have 1 unified image. There is at least 7 figures when it should have only one.

whosawhatsis commented 1 year ago

Yes, but I'm saying that if you turn that setting down, you WON'T get those extra subjects, because it will be doing less redrawing of the image, which means it doesn't create new subjects. You can always use it with larger width/height settings if you have the VRAM for it, or just use --hires_fix instead to do the process on the whole image at once, but the whole point of the feature is to process it in smaller chunks because most people CAN'T process an image that size all at once.

You can also remove parts of your prompt when using embiggen that have a big effect on composition, and limit it to a style sub-prompt, to help prevent the introduction of new subjects. On discord, we've been discussing possibilities for improving this feature by removing terms from the prompt when embiggening, and possibly even creating programmatic ways to detect areas where certain sub-prompts are relevant so that they can be programmatically added back in (using face detection and/or clip interrogation, for example).

A final tip, I've found that there's a lot of optimization to be found by arranging your tiles so that they don't split important parts of the subject (faces, in particular). For example, I've been generating tall images, then using tiles that are short and wide enough to stretch across the entire final image, so that it only tiles vertically. This helps with symmetry, by doing things like ensuring that both of the subject's eyes are in the same tile, but it also has the benefit of ensuring that there are no tiles that don't contain a recognizable piece of the image's subject, and thus make the software less prone to thinking it needs to create a new one.

Neosettler commented 1 year ago

Thank you for your feedback @whosawhatsis,

original/input image: stable-diffusion-2022-12-09-154531-2592304768

here is the command line for the image posted earlier:

--karras_max 0 --iterations 1 --sampler \"k_lms\" --height 640 --width 512 --cfg_scale 7 --steps 50 --perlin 0 --threshold 0 --log_tokenization --individual --strength 0.75 --embiggen 2 0.75 0.25

Changing to: --embiggen 2 0.1 0.25 has no visible effect!!! What am I missing? EDIT: Using Invoke AI 2.0

whosawhatsis commented 1 year ago

My guess is that --strength 0.75 is overriding it...

I'd be willing to do some tests, but I'd also need a prompt, and the image in your post does not appear to have its metadata intact.

Neosettler commented 1 year ago

My guess is that --strength 0.75 is overriding it...

I think you're right, It does conflict!

whosawhatsis commented 1 year ago

So does removing/changing that fix your issue, or do you still want me to test it?

Neosettler commented 1 year ago

Yes, it's a fix for embiggen_strength not having any affect, thank you!

I'll have to make more tests for the overall feature but it does yields much better result right off the bat.

stable-diffusion-2022-12-09-164840-2592304768

would you say it is the same feature as txt2imghd does? https://github.com/jquesnelle/txt2imghd

txt2imghd is a port of the GOBIG mode from progrockdiffusion applied to Stable Diffusion, with Real-ESRGAN as the upscaler. It creates detailed, higher-resolution images by first generating an image from a prompt, upscaling it, and then running img2img on smaller pieces of the upscaled image, and blending the result back into the original image.

txt2imghd with default settings has the same VRAM requirements as regular Stable Diffusion, although generation of the detailed images will take longer.

whosawhatsis commented 1 year ago

Sounds like that's doing exactly the same thing. Btw, 0.1 might be a little low. I've used strengths from 0.2-0.3 with good results.

Neosettler commented 1 year ago

Beautiful! Thank you @whosawhatsis

Are you plaining to do something about the --strength override? maybe --embiggen_strength is not needed after all?

whosawhatsis commented 1 year ago

I'm not planning anything, but I'm not one of the developers, just another user. I assume the options will be a little more obvious/less redundant in the web ui, once the embiggen functionality is available there.

Neosettler commented 1 year ago

I'm not planning anything

That made me smile :) Thank you for your support @whosawhatsis. Very appreciated!

JPPhoto commented 1 year ago

Hi, @Neosettler.

To clarify how embiggen works, it takes your original image, enlarges it using ESRGAN, then divides that into tiles of the width and height you specify with an overlapping region between tiles. It then runs an img2img pass over each tile using the ddim sampler and blends them together using that overlap percentage. You can alter the scale, tile size and overlap percentage, change the strength, and even specify the prompt you want to use. You can also use some of the regular options for image generation like changing the CFG strength.

I was responsible for implementing --embiggen_strength. If you look at the code (https://github.com/invoke-ai/InvokeAI/blob/main/ldm/generate.py#L665), you can see that the default value is 0.40 for img2img strength. The --strength option is ignored with --embiggen; rather, --embiggen_strength is the only way to change the img2img strength.

If you're having trouble with coherence in your final image (e.g. duplication), you can take some of the following steps:

Neosettler commented 1 year ago

@JPPhoto Last time I've tried, --new_prompt has no effect on the resulting image. Would you mind confirming this?

JPPhoto commented 1 year ago

@Neosettler The way to test this is to use embiggen without resizing the image. I'm all set up for main development and not in a position to test this at the moment; can you try that and use a high strength to see what happens and report back?

Neosettler commented 1 year ago

@JPPhoto thank you for your support, a few notes

without resizing (scale to 1) gives this error: ERROR: Based on the requested dimensions of 512x640 and tiles of 512x640 you don't need to Embiggen! Check your arguments.

Adding to the mix --esrgan_strength, --strength and --embigen_strength is a bit confusing to say the least.

--new_prompt red, has not incidence on the result whatsoever.

It then runs an img2img pass over each tile using the ddim

Maybe use --sampler or add --embigen_sampler?

psychedelicious commented 8 months ago

We no longer have embiggen as a feature, but now have tiled upscaling as a beta workflow to do the same thing.