omerbt / MultiDiffusion

Official Pytorch Implementation for "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation" presenting "MultiDiffusion" (ICML 2023)
https://multidiffusion.github.io/
979 stars 57 forks source link

My custom implemetation in Automatic1111's WebUI #5

Open pkuliyi2015 opened 1 year ago

pkuliyi2015 commented 1 year ago

Dear authors,

I have implemented your algorithm to Automatic1111's WebUI with the following optimization:

Some WebUI related stuffs:

Here is the link:

Great thanks to your fantastic work especially in img2img and panorama generation! We are working on text prompt now.

But the uncontrolled large image generation is not ideal at all, as repeated patterns always appears and the image is mostly unusable.

Would you please give us some insights, if we can generate large images without a user-specified prompt mask?

For example, I have an idea (without proof): we may generate a small reference image first, obtain the prompt attention map, scale it to a larger resolution, and finally we automatically locate the prompt to its correct views during multi-diffusion.

Thank you very much!

omerbt commented 1 year ago

Thank you for implementing MultiDiffusion with the WebUI -- looks great!

Regarding larger images -- in the simplest setting of having the same prompt for all views, then almost by definition it may be unsuitable for certain prompts/resolutions (e.g., when generating a single object that should not appear in each view). I think that a coarse-to-fine generation approach can help with this.