ControlLoRA Version 3: A Lightweight Neural Network To Control Stable Diffusion Spatial Information Version 3

ControlLoRA Version 3 is a neural network structure extended from ControlNet to control diffusion models by adding extra conditions.

Inspired by control-lora (StabilityAI), ControlLoRA, control-lora-v2 and script train_controlnet.py from diffusers, control-lora-v3 does not add new features, but provides a PEFT implement of ControlLoRA.

News

[x] Jul. 18, 2024. Add convert script for WebUI. (Jul. 31, 2024. Bug fixed.)
[x] Jun. 08, 2024. Norm layer is trainable.
[x] May. 19, 2024. Add DoRA.

Data

To train ControlLoRA, you should have image-conditioning_image-text datasets. Of course you can hardly train on LAION-5B dataset in direct like Stable Diffusion. Here are some:

fusing/fill50k. I do not suggest you to train ControlLoRA seriously as it is simple and lack of diversity.
HighCWu/diffusiondb_2m_first_5k_canny. A small canny dataset. Here is poloclub/diffusiondb dataset. For canny condition, you can easily generate your own dataset.
Nahrawy/VIDIT-Depth-ControlNet. Depth map? Heat map? But it is good!
SaffalPoosh/scribble_controlnet_dataset. Many duplicate images. I suggest you synthesize your dataset.
lavinal712/SAM-LLAVA-55k-canny. A canny dataset with detail caption.

Models

Stable Diffusion

Stable Diffusion v1-5 is the base model.

Stable Diffusion v1-4, Stable Diffusion v2-1, Stable Diffusion XL need to be vertified.

ControlLoRA

lavinal712/sd-control-lora-canny-v3. Canny condition model trained on lavinal712/SAM-LLAVA-55k-canny with 50000 steps, including converted ControlLoRA model, merged ControlNet model and original adapter model.

Train

You can train either ControlNet or ControlLoRA using script train_control_lora.py.

Train ControlNet

By observation, training 50000 steps with batch size of 4 is the balance between image quality, control ability and time.

accelerate launch train_control_lora.py \
 --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
 --output_dir="controlnet-model" \
 --dataset_name="fusing/fill50k" \
 --resolution=512 \
 --learning_rate=1e-5 \
 --train_batch_size=4 \
 --max_train_steps=100000 \
 --tracker_project_name="controlnet" \
 --checkpointing_steps=5000 \
 --validation_steps=5000 \
 --report_to wandb

Train ControlLoRA

To train ControlLoRA, add --use_lora in start command to activate it.

accelerate launch train_control_lora.py \
 --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
 --output_dir="control-lora-model" \
 --dataset_name="fusing/fill50k" \
 --resolution=512 \
 --learning_rate=1e-4 \
 --train_batch_size=4 \
 --max_train_steps=100000 \
 --tracker_project_name="control-lora" \
 --checkpointing_steps=5000 \
 --validation_steps=5000 \
 --report_to wandb \
 --use_lora \
 --lora_r=32 \
 --lora_bias="all"

You can also train ControlLoRA / ControlNet with your own dataset.

accelerate launch train_control_lora.py \
 --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
 --output_dir="control-lora-model" \
 --conditioning_image_column="hint" \
 --image_column="jpg" \
 --caption_column="txt" \
 --resolution=512 \
 --learning_rate=1e-4 \
 --train_batch_size=4 \
 --num_train_epochs=3 \
 --max_train_steps=100000 \
 --tracker_project_name="control-lora" \
 --checkpointing_steps=5000 \
 --validation_steps=5000 \
 --report_to wandb \
 --use_lora \
 --lora_r=32 \
 --lora_bias="all" \
 --custom_dataset="custom_datasets.tutorial.MyDataset"

Merge

If you want to merge ControlLoRA to ControlNet, use merge_lora.py script.

python merge_lora.py --base_model runwayml/stable-diffusion-v1-5 --control_lora /path/to/control-lora --output_dir /path/to/save/ControlNet

Convert

Now you can convert ControlLoRA weight from HuggingFace diffusers type to Stable Diffusion type. The converted model can be used in AUTOMATIC1111's Stable Diffusion web UI and ComfyUI.

PS: ControlLoRA should set --lora_bias="all" in training script.

python convert_diffusers.py --adapter_model /path/to/adapter/model --output_model /path/to/output/model

Test

Original image:

house

Output:

house_grid

Citation

@software{lavinal7122024controllorav3,
    author = {lavinal712},
    month = {5},
    title = {{ControlLoRA Version 3: A Lightweight Neural Network To Control Stable Diffusion Spatial Information Version 3}},
    url = {https://github.com/lavinal712/control-lora-v3},
    version = {1.0.0},
    year = {2024}
}

lavinal712 / control-lora-v3

readme