My custom implemetation in Automatic1111's WebUI

Dear authors,

I have implemented your algorithm to Automatic1111's WebUI with the following optimization:

Cropping views in a more symmetric way to get a better result.
Pre-calculate weights to save time (as weights won't change once the views are determined.
Batched latent view processing for acceleration.

Some WebUI related stuffs:

Compatibility with all samplers.
Compatibility with ControlNet.

Here is the link:

https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111

Great thanks to your fantastic work especially in img2img and panorama generation! We are working on text prompt now.

But the uncontrolled large image generation is not ideal at all, as repeated patterns always appears and the image is mostly unusable.

Would you please give us some insights, if we can generate large images without a user-specified prompt mask?

For example, I have an idea (without proof): we may generate a small reference image first, obtain the prompt attention map, scale it to a larger resolution, and finally we automatically locate the prompt to its correct views during multi-diffusion.

Thank you very much!

omerbt / MultiDiffusion

My custom implemetation in Automatic1111's WebUI #5