lllyasviel / ControlNet-v1-1-nightly

Nightly release of ControlNet 1.1
4.48k stars 364 forks source link

Is it possible to enhance the straight-line conditioning? #50

Closed xarthurx closed 1 year ago

xarthurx commented 1 year ago

Original Title: Is it possible to enhance the straight-line conditioning?

Hello, thank you for the great work. CN + SD really changed the design field a lot.

I'm from both architecture and computer science background, and am currently investigating how far we can go in this direction for conceptual design phase.

There's one issue that we've tried to improve for a while, but cannot get through:

SD w/o CN image

SD with CN image

If you look at the image above, the mullions and window frames are not straight, the lines are wobbly. We used a screenshot of a 3D model for the conditioning, but regardless of the preprocessor used, the generated images always have more or less issues like this.

What we thought about the cause might be:

  1. The preprocessed image has only 512 resolution, which makes the processed lines already wobbly (some lines are very light after processing)
  2. this is a short-comming of the SD itself.

We also tried to use volume screenshot without the mullions, but the results are similar:

SD with CN image

Question:

At this point, we'd like to seek advice from the developers how this issue can be improved:

  1. Should we train a Diffusion model (dreambooth, or LoRA approach) with more architecture related model (we've tried a few from Civitai, but the improvements are limited)
  2. Should we train our own CN (for instance, a series of "non-perfect canny-style" images + perfect architecture rendering to have a CN understand those facade need to have straight mullions)?
  3. Or what should we do at this point?
lllyasviel commented 1 year ago

Just Use Automatic 1111

Below results are all default parameters and the same simple prompts shown in my screenshot. A1111 is just magic. image image image

lllyasviel commented 1 year ago

Edit: Frequently asked questions are edited and pinned to help more people. Edit2: Closed since solution found. Edited title restored.

xarthurx commented 1 year ago

@lllyasviel First, really thank you for your time about this topic.

For the image you generated, I'd like to provide an architectural perspective:

As we're professionals, we evaluate the quality of the specific architecture seriously (geometry, space quality, etc.), and not based on the "general feeling" or the "style" of the image.

So if you look at the facade in the image, you'll see that the mullions and windows are in strange shape. We've experienced a lot in this effect and cannot overcome it completely with training dreambooth or lora. -- That's why we're here, and would like to seek advice from you to see of ControlNET can help.

image

lllyasviel commented 1 year ago

u can somewhat solve these, to some extent, using cnet 1.1 tile (v11f1e) but this is again another a1111-only feature and requires learning some a1111 knowledges image (and you can try m**j*****y and compare which solution is better) (and if you want to burn ur gpu, u can try running this image in tile again. tile is almost infinite for images with buildings like this. but this will really burn the gpu)

xarthurx commented 1 year ago

u can somewhat solve these, to some extent, using cnet 1.1 tile (v11f1e) but this is again another a1111-only feature and requires learning some a1111 knowledges image (and you can try m**j*****y and compare which solution is better) (and if you want to burn ur gpu, u can try running this image in tile again. tile is almost infinite for images with buildings like this. but this will really burn the gpu)

Really helpful input!

  1. We turned to SD+ControlNet from MJ becaused we need to control the geometry more strictly in the later part of the design process, so MJ is not an option for non-conceptial design.
  2. The somewhat results help to some extent (YES, we're indeed using a1111), but not fully resolve the problem (it may by burning the GPU very hard). It seems my naive proposal of trainig a cnet was not a good idea to you. Theoretically, do you think there's a possibility, though doesn't have to be a quick / user-end solution, to resolve the issue?
lllyasviel commented 1 year ago

it seems if we just consider these examples, the best solution is to use scripts to progressively upscale it with tile, until each window in those buildings have a 512x512 resolution, I estimated it and the resolution needed to solve this image is about 52,428*39,322. We do not need to change the prompt; can always use "beautiful city with buildings, 4k, 8k, balabalabala". Generate a perfect image may take many hours on a 4090

lllyasviel commented 1 year ago

unfortunaly, it seems at that resolution, webui's gradio HTML crashes before controlnet fail. Good news is that controlnet is still working at that scale. bad news is that your browser does not support it. perhaps try firefox

xarthurx commented 1 year ago

it seems if we just consider these examples, the best solution is to use scripts to progressively upscale it with tile, until each window in those buildings have a 512x512 resolution, I estimated it and the resolution needed to solve this image is about 52,428*39,322. We do not need to change the prompt; can always use "beautiful city with buildings, 4k, 8k, balabalabala". Generate a perfect image may take many hours on a 4090

unfortunaly, it seems at that resolution, webui's gradio HTML crashes before controlnet fail. Good news is that controlnet is still working at that scale. bad news is that your browser does not support it. perhaps try firefox

This is definitely a “theoretical” solution (though different from what I expected), but I kind of understand how the "tile" works unexpectedly. 🤣

I guess then for practical use (need ~2k resolution in < 5min), this is still an "unresolved" problem... As I originally and incorrectly assume this can be fixed by a special type of cnet, it seems I need to wait for a more "vector-based" style plugin to control for such things...

But anyway, thank you for your time and input. Really appreciate it.

xarthurx commented 1 year ago

it seems if we just consider these examples, the best solution is to use scripts to progressively upscale it with tile, until each window in those buildings have a 512x512 resolution, I estimated it and the resolution needed to solve this image is about 52,428*39,322. We do not need to change the prompt; can always use "beautiful city with buildings, 4k, 8k, balabalabala". Generate a perfect image may take many hours on a 4090

It just came to my mind after posting the above post that, we actually use a region-based script to "upscale and then downscale" the area of the facade?

This save GPU time and probably can save the browser, too?

lllyasviel commented 1 year ago

LDM learn specific patterns in specific conv layer levels - if you want to get the learned pattern to draw something like a window on a wall, you need to give a 512x512 space to occupy that thing so that the specific patterns learned in corresponding conv layer can be triggered. so you cannot downscale it, unfortunately But perhaps can try only slicing the tiles along with mlsd lines to save computation power. But we already begin to burn gpu, then perhaps just burn it without unnecessary mercy

xiaohaipeng commented 1 year ago

@lllyasviel >

oh,god,this pic perfect ,has great details,with controlnet tile model,how do you set params in detail?

daizhuo commented 1 year ago

u can somewhat solve these, to some extent, using cnet 1.1 tile (v11f1e) but this is again another a1111-only feature and requires learning some a1111 knowledges image (and you can try m**j*****y and compare which solution is better) (and if you want to burn ur gpu, u can try running this image in tile again. tile is almost infinite for images with buildings like this. but this will really burn the gpu)

How do you make this? Could you provide a detailed processing? This processing is very import for architecture design. I tried with no luck! Thank you so much!