huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.43k stars 5.27k forks source link

train_controlnet.py broken local data loading #7267

Open morrisalp opened 7 months ago

morrisalp commented 7 months ago

Describe the bug

The script examples/controlnet/train_controlnet.py is designed to either take a HF dataset with --dataset_name (as illustrated here) or local data --train_data_dir, but the latter case fails out-of-the-box.

Reproduction

Running script with flags --train_data_dir, --image_column, --conditioning_image_column, --caption_column referring to dataset in standard HF format with metadata.csv file in the data dir. Fails for two reasons:

  1. metadata.csv must have column named file_name containing the filename of the target image files, but flag must be set to --image_column=image and not --image_column=file_name, which is an undocumented bug.
  2. The script fails to load the images from the conditioning image column and instead treats the strings as PIL Images causing an error. I had to change the current line 702 from: conditioning_images = [image.convert("RGB") for image in examples[conditioning_image_column]] to: conditioning_images = [Image.open(image).convert("RGB") for image in examples[conditioning_image_column]] for the script to run.

Logs

No response

System Info

Who can help?

@sayakpaul @yiyixuxu @DN6

sayakpaul commented 7 months ago

Feel free to send over a PR to fix :)

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Awj2021 commented 4 months ago

Some issue. But after I modify the code to read the images, it has one dimension error like this (batch_size=2):

RuntimeError: Given groups=1, weight of size [16, 3, 3, 3], expected input[2, 1, 512, 512] to have 3 channels

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.