huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.97k stars 5.35k forks source link

ControlNet Training failed on validation using the default Tensorboard report_to option #2695

Closed takuma104 closed 1 year ago

takuma104 commented 1 year ago

Describe the bug

I tried the training of the ControlNet in the main branch right away. The default option for --report_to is set to tensorboard, it seems to raise a ValueError and stop the process after generating validation images. As a workaround, using wandb did not cause this issue.

Reproduction

Using this script (16GB sample from README.md, I added a mandatory tracker_project_name option):

#!/bin/bash

export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="./checkpoints"

accelerate launch ../examples/controlnet/train_controlnet.py \
 --pretrained_model_name_or_path=$MODEL_DIR \
 --output_dir=$OUTPUT_DIR \
 --dataset_name=fusing/fill50k \
 --resolution=512 \
 --learning_rate=1e-5 \
 --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
 --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
 --train_batch_size=1 \
 --gradient_accumulation_steps=4 \
 --gradient_checkpointing \
 --use_8bit_adam \
 --tracker_project_name fill50k

Adding the --report_to wandb option should prevent the issue.

Logs

{'dynamic_thresholding_ratio', 'lower_order_final', 'predict_x0', 'solver_order', 'sample_max_value', 'solver_p', 'solver_type', 'disable_corrector', 'thresholding'} was not found in config. Values will be initialized to default values.
/home/takuma/miniconda3/envs/torch1.13.1/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
  File "/home/takuma/Documents/co/diffusers/train/../examples/controlnet/train_controlnet.py", line 1063, in <module>
    main(args)
  File "/home/takuma/Documents/co/diffusers/train/../examples/controlnet/train_controlnet.py", line 1030, in main
    log_validation(
  File "/home/takuma/Documents/co/diffusers/train/../examples/controlnet/train_controlnet.py", line 139, in log_validation
    formatted_images = np.stack(formatted_images)
  File "<__array_function__ internals>", line 180, in stack
  File "/home/takuma/miniconda3/envs/torch1.13.1/lib/python3.10/site-packages/numpy/core/shape_base.py", line 426, in stack
    raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape


### System Info

- `diffusers` version: 0.15.0.dev0  16ea3b5379c1e78a4bc8e3fc9cae8d65c42511b1 
- Platform: Linux-5.19.0-32-generic-x86_64-with-glibc2.35
- Python version: 3.10.9
- PyTorch version (GPU?): 1.13.1 (True)
- Huggingface_hub version: 0.12.1
- Transformers version: 4.26.1
- Accelerate version: 0.17.0.dev0
- xFormers version: 0.0.17.dev473
- Using GPU in script?: Yes. RTX3090
- Using distributed or parallel set-up in script?: N
offchan42 commented 1 year ago

I got the same error. Also it's weird to me that tracker_project_name is mandatory but it has a default value in the code. I think someone need to remove required=True argument from the following code:

    parser.add_argument(
        "--tracker_project_name",
        type=str,
        default="train_controlnet",
        required=True,
        help=(
            "The `project_name` argument passed to Accelerator.init_trackers for"
            " more information see https://huggingface.co/docs/accelerate/v0.17.0/en/package_reference/accelerator#accelerate.Accelerator"
        ),
    )
patrickvonplaten commented 1 year ago

cc @williamberman can you check here?

sayakpaul commented 1 year ago

I think since np.stack() requires the underlying arrays to be of a uniform shape, this is being surfaced. WandB doesn't raise it because we log the images there individually.

https://github.com/huggingface/diffusers/blob/73bdad08a1fae592c73e211b55b83745c487b5ea/examples/controlnet/train_controlnet.py#L154

A relatively easy fix would be just ensure the validation images are od the same shape when logging to TensorBoard.

williamberman commented 1 year ago

Hi sorry for the delay here. Will try to take a look sometime this week

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten commented 1 year ago

Gentle ping @williamberman here

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten commented 1 year ago

Gentle ping here @williamberman

williamberman commented 1 year ago

Sorry for missing this! This was fixed with this PR https://github.com/huggingface/diffusers/pull/2945 as the input images had 4 channels and the output images had 3 channels.

I double checked that there wasn't a lingering issue with different resolution input images. Different resolution input images work as the pipeline output resolutions are the same as the inputs and we don't call np.stack cross different sets of validation inputs/outputs.

kajc10 commented 6 months ago

If anyone still has this problem due to bad sizing and want to log to tensorboard, padding can solve it.

formatted_images = np.stack(formatted_images) ->

max_width = max(image.shape[1] for image in formatted_images)
max_height = max(image.shape[0] for image in formatted_images)
padded_images = []
for image in formatted_images:
    pad_width = max_width - image.shape[1]
    pad_height = max_height - image.shape[0]
    padded_image = np.pad(image, ((0, pad_height), (0, pad_width), (0, 0)), mode='constant', constant_values=0)
    padded_images.append(padded_image)
formatted_images = np.stack(padded_images)