lllyasviel / ControlNet-v1-1-nightly

Nightly release of ControlNet 1.1
4.48k stars 364 forks source link

SD21 models? #56

Closed bghira closed 1 year ago

bghira commented 1 year ago

Pardon me if I missed something, but the example for the filename format has sd21 and sd21 768 as hints that they exist, but they're not in the models folder. Is it just the config that sets the difference up? I would think the model has to be actually trained for 2.1.

SGKino commented 1 year ago

Same question.

lllyasviel commented 1 year ago

rightnow 2.1 models are only available from community implementations

xarthurx commented 1 year ago

@lllyasviel

I would like to follow up with this post and share some of my experiments here. Based on my initial test, DreamBooth trained on SD v1.5 / v2.1 has a huge difference, using the same set of parameters. (within the architecture categories) I guess this is partially because of the switch of CLIP models, but not sure.

AFAIK, the quality of CN also depends on the base model, I would highly recommend the CN team consider providing an official version of CN upon the SD v2-series.


Some of the results (DreamBooth trained from 30 high-quality images)

DreamBooth on SD v1.5 (from underfit to overfit) img_trained_v1-2_1400-3500

DreamBooth on SD v2.1 (from underfit to overfit) image

DreamBooth on SD v1.5 (high-rise) image

DreamBooth on SD v2.1 (high-rise) image

I can provide more results if you want. But the general quality of SD-v2.1-based model is much better than the SD-v1.5-based one.

bghira commented 1 year ago

yes! I have stopped using 1.5 and no longer train on it or support users that wish to fix issues in it.

I am currently training 1.2 million images on 2.1.

if you need help making the 2.1 models, please help us figure out how to help you.

if you don't want to make 2.1 models, i would really like to know why.

1.5 is a dead model, and needs to be upgraded to 2.1.

bghira commented 1 year ago

please reopen this so we can track the issue and other users can come along and add a +1.

lllyasviel commented 1 year ago

@xarthurx it looks the training failed. what scripts are you using?

xarthurx commented 1 year ago

@lllyasviel what do you mean by "failed"?

For "box-like" buildings, default generation by SD without CN always have this "zig-zag" line issue for the facade -- that's way I asked if CN can help with this issue in another post, and you recommended "tile".

I'm using the default one from the diffusers' example folder (dreambooth), feeding my own data.

lllyasviel commented 1 year ago

@xarthurx Rightnow I believe the most technically correct training method is kohya_ss https://github.com/bmaltais/kohya_ss

Feel free to try it. The resulting quality can be much higher and it is as optimized as A1111

xarthurx commented 1 year ago

@xarthurx Rightnow I believe the most technically correct training method is kohya_ss https://github.com/bmaltais/kohya_ss

Feel free to try it. The resulting quality can be much higher and it is as optimized as A1111

This one is for LoRA, isn't it? I haven't tried training LoRA but only Dreambooth, as from what I read over the Internet, trained large base-model has higher influence than additional networks/LoRA. Guess I need to try it now.

But how does it relate to "optimized as A1111"? I understand A1111 as a webUI with additional scripts to simplify the process of passing different parameters... Is this wrong?

For all the experiments I have made, I'm using simple prompt without any modification -- I can make the image looks nicer with prompt engineering, but this is not the purpose at the moment.

I'm targeting a better base model (or LoRA perhaps) with better generalization ability. Prompt Engineering is the next steps for the users/students.

And from the above example, I assume SD v2.1 are doing better than SD v1.5 for this stage.

Hope the explanation makes it clear.

lllyasviel commented 1 year ago
  1. kohya_ss has Dreambooth 2. Yes you can also try lora because lora can be applied to any base models so getting visually beautiful image can be cheap.

A1111/kohya_ss and Diffuser are different in many details and A1111/kohya_ss usually have higher visual quality. A1111/kohya_ss have lots of people trying to tune every line of codes and every pixel to maximize the visual quality so the result can be greatly different from other implementations.

And sometimes I think A1111’s default method Euler A is an unpublished algorithm and someone should write a paper about what detail modifications make the result quality so high.

xarthurx commented 1 year ago

@lllyasviel

Thanks, I'll do several tests/benchmark with this and update here.

Just to follow up with the topic of this issue, is there any plan to release an official version of SD v2-series in the near future? Or should we rely on the community for that?

bghira commented 1 year ago

A1111’s default method Euler A is an unpublished algorithm

probably best to stay away from A1111's anything if it cannot be verified where it has originated.

mh-nyris commented 1 year ago

Hi All, where can I find "trusted" community models using SD2.1 with ControlNet1.1? (Found this on huggingface: https://huggingface.co/thibaud/controlnet-sd21/resolve/main/control_v11p_sd21_canny.ckpt, the .yaml file looks ligit: https://huggingface.co/thibaud/controlnet-sd21/raw/main/control_v11p_sd21_canny.yaml). Any help is much appreciated. Best, mike