Closed brycegoh closed 3 months ago
Did you use the same VAE during training as well? And did you use FP16 during training?
Could you share your training command so that I can reproduce this?
Also, did you observe similar things when using the toy example shown in the README?
Did you use the same VAE during training as well? And did you use FP16 during training?
Could you share your training command so that I can reproduce this?
Also, did you observe similar things when using the toy example shown in the README?
accelerate launch diffusers/examples/controlnet/train_controlnet_sdxl.py \
--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0 \
--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix \
--conditioning_image_column=conditioning_image \
--image_column=image \
--caption_column=text \
--dataset_name=$DATASET_NAME \
--mixed_precision="fp16" \
--resolution=1024 \
--learning_rate=1e-5 \
--lr_scheduler=cosine \
--num_train_epochs=2 \
--validation_image=$VALIDATION_IMG \
--validation_prompt="$VALIDATION_PROMPT" \
--validation_steps=$VALIDATION_STEPS \
--train_batch_size=7 \
--gradient_accumulation_steps=10 \
--hub_model_id=$HF_HUB_REPO_ID\
--report_to="wandb" \
--tracker_project_name=sdxl_cn \
--push_to_hub
Could you try 3?
Could you try 3?
Just tried the example command as listed here and I encountered the same issue. I trained for a total of 300 steps for debugging purposes. Here are the final weights and checkpoint weights as well
Training command:
accelerate launch diffusers/examples/controlnet/train_controlnet_sdxl.py \
--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0 \
--dataset_name=fusing/fill50k \
--mixed_precision="fp16" \
--resolution=1024 \
--learning_rate=1e-5 \
--max_train_steps=300 \
--checkpointing_steps=100 \
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
--validation_steps=100 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--hub_model_id=brycegoh/sdxl-cn-example \
--seed=42 \
--report_to="wandb" \
--tracker_project_name=sdxl_cn_example \
--push_to_hub
Inference code:
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, UniPCMultistepScheduler, AutoencoderKL
from diffusers.utils import load_image
import torch
base_model_path = "stabilityai/stable-diffusion-xl-base-1.0"
controlnet_path = "brycegoh/sdxl-cn-example"
controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
base_model_path, controlnet=controlnet, torch_dtype=torch.float16
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe.enable_model_cpu_offload()
control_image = load_image("./conditioning_image_1.png").convert('RGB')
prompt = "red circle with blue background"
image = pipe(
prompt, num_inference_steps=100, image=control_image
).images[0]
Inference output:
Wandb output:
Very weird thing.
I just created this PR wherein we additionally call the log_validation()
function after serializing the serializing ControlNet checkpoint. This is exactly the same behavior as the inference code. This helps validate if the trained checkpoint is effective enough. I am happy to be proven wrong if that's not the case.
I have also added a note for the ControlNet SDXL training script mentioning that you should ensure that you're using the same VAE you used during training.
Command:
accelerate launch train_controlnet_sdxl.py \
--pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
--pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \
--output_dir="controlnet-sdxl" \
--dataset_name=fusing/fill50k \
--max_train_samples=100 \
--mixed_precision="fp16" \
--resolution=1024 \
--learning_rate=1e-5 \
--max_train_steps=150 \
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
--validation_steps=50 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--report_to="wandb" \
--seed=42 \
--push_to_hub
Results:
Command:
accelerate launch train_controlnet.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
--output_dir="controlnet-sd" \
--dataset_name=fusing/fill50k \
--max_train_samples=100 \
--mixed_precision="fp16" \
--resolution=1024 \
--learning_rate=1e-5 \
--max_train_steps=150 \
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
--validation_steps=50 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--report_to="wandb" \
--seed=42 \
--push_to_hub
Results:
Hopefully, that helps?
@sayakpaul sorry, don't quite understand what are the steps to take from here. Should we try working from your branch? or is there something else.
Very weird thing.
I just created this PR wherein we additionally call the
log_validation()
function after serializing the serializing ControlNet checkpoint. This is exactly the same behavior as the inference code. This helps validate if the trained checkpoint is effective enough. I am happy to be proven wrong if that's not the case.I have also added a note for the ControlNet SDXL training script mentioning that you should ensure that you're using the same VAE you used during training.
SDXL ControlNet
Command:
ccelerate launch train_controlnet_sdxl.py \ --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \ --pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \ --output_dir="controlnet-sdxl" \ --dataset_name=fusing/fill50k \ --max_train_samples=100 \ --mixed_precision="fp16" \ --resolution=1024 \ --learning_rate=1e-5 \ --max_train_steps=150 \ --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ --validation_steps=50 \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --report_to="wandb" \ --seed=42 \ --push_to_hub
Results:
- Checkpoint: https://huggingface.co/sayakpaul/controlnet-sdxl
- WandB: https://wandb.ai/sayakpaul/sd_xl_train_controlnet/runs/v6vkby93
SD ControlNet
Command:
accelerate launch train_controlnet.py \ --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \ --output_dir="controlnet-sd" \ --dataset_name=fusing/fill50k \ --max_train_samples=100 \ --mixed_precision="fp16" \ --resolution=1024 \ --learning_rate=1e-5 \ --max_train_steps=150 \ --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ --validation_steps=50 \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --report_to="wandb" \ --seed=42 \ --push_to_hub
Results:
- Checkpoint: https://huggingface.co/sayakpaul/controlnet-sd
- WandB: https://wandb.ai/sayakpaul/train_controlnet/runs/d6k17epg
Hopefully, that helps?
Thanks @sayakpaul for the quick reply. I just tried your final output weights in my inference script and it seems to be having the same issue.
I posted my inference notebook that I ran on kaggle here (Public Repo Link). Please advise if I am getting something wrong?
Inference:
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, UniPCMultistepScheduler, AutoencoderKL
from diffusers.utils import load_image
import torch
base_model_path = "stabilityai/stable-diffusion-xl-base-1.0"
controlnet_path = "sayakpaul/controlnet-sdxl"
controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16)
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
base_model_path, vae=vae, controlnet=controlnet, torch_dtype=torch.float16
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe.enable_model_cpu_offload()
control_image = load_image("/kaggle/input/base-images/conditioning_image_1.png").convert('RGB')
prompt = "red circle with blue background"
image = pipe(
prompt, num_inference_steps=100, image=control_image
).images[0]
Output:
Expected output based on your wandb report:
Could you test the latest changes i.e., https://github.com/huggingface/diffusers/pull/7096/commits/937c66bf23d91a95113f4da1d1836da757551066?
Additionally, I would suggest matching the inference logic as close to the one used during logging as possible:
from diffusers import UniPCMultistepScheduler, ControlNetModel, AutoencoderKL, StableDiffusionXLControlNetPipeline
from diffusers.utils import load_image, make_image_grid
import torch
pipeline_id = "stabilityai/stable-diffusion-xl-base-1.0"
vae_id = "madebyollin/sdxl-vae-fp16-fix"
controlnet_id = "sayakpaul/controlnet-sdxl"
controlnet = ControlNetModel.from_pretrained(controlnet_id, torch_dtype=torch.float16)
vae = AutoencoderKL.from_pretrained(vae_id, torch_dtype=torch.float16)
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
pipeline_id, controlnet=controlnet, vae=vae, torch_dtype=torch.float16
).to("cuda")
pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.config)
control_image = load_image("conditioning_image_1.png").convert("RGB").resize((1024, 1024))
prompt = "red circle with blue background"
generator = torch.Generator(device="cuda").manual_seed(42)
images = pipeline(
prompt=prompt, image=control_image, num_images_per_prompt=4, num_inference_steps=20, generator=generator
).images
make_image_grid([control_image] + images, 1, 5).save("image_grid.png")
the only issue here is the control_image
isn't resized to 1024,1024
there is nothing wrong with the training script; let's update the inference code in readme @sayakpaul
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
met the same problem, even only 16 samples with 500 iters can not overfit and make good results.: https://github.com/huggingface/diffusers/issues/9179
Describe the bug
I am training a controlnet using the diffusers script. I have set it to save a checkpoint every 200 steps. However, when I try to use the safetensor file for inference, the output is completely different from the one reported in wandb.
The inference code is the same as the one in the README for sdxl controlnet training script.
Please advise, thanks!
Reproduction
Both wandb and inference outputs uses the same prompt and conditioning image.
Inference output:
Wandb reporting output:
This is the inference code:
the subfolder
checkpoint-200/controlnet
contains thediffusion_pytorch_model.safetensors
andconfig.json
Logs
No response
System Info
Training on Runpod with a A40 GPU
Who can help?
@sayakpaul @patrickvonplaten