RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead

TwinkyW commented 11 months ago

Describe the bug

I'm trying to follow the Dreambooth training example and I'm getting this error:

This is the full window:

accelerate launch train_dreambooth.py --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" --instance_data_dir=".\dog" --output_dir=".\output" --instance_prompt="a photo of sks dog" --resolution=512 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=5e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=400 11/25/2023 22:59:47 - INFO - main - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda

Mixed precision type: no

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors. {'clip_sample_range', 'timestep_spacing', 'prediction_type', 'variance_type', 'thresholding', 'sample_max_value', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values. {'mid_block_only_cross_attention', 'conv_in_kernel', 'dual_cross_attention', 'resnet_time_scale_shift', 'time_embedding_type', 'transformer_layers_per_block', 'class_embed_type', 'use_linear_projection', 'cross_attention_norm', 'num_attention_heads', 'conv_out_kernel', 'encoder_hid_dim', 'projection_class_embeddings_input_dim', 'time_embedding_dim', 'addition_embed_type_num_heads', 'mid_block_type', 'class_embeddings_concat', 'upcast_attention', 'addition_time_embed_dim', 'only_cross_attention', 'encoder_hid_dim_type', 'resnet_out_scale_factor', 'dropout', 'time_embedding_act_fn', 'timestep_post_act', 'reverse_transformer_layers_per_block', 'resnet_skip_time_act', 'time_cond_proj_dim', 'attention_type', 'addition_embed_type', 'num_class_embeds'} was not found in config. Values will be initialized to default values. 11/25/2023 22:59:51 - INFO - main - Running training 11/25/2023 22:59:51 - INFO - main - Num examples = 5 11/25/2023 22:59:51 - INFO - main - Num batches each epoch = 5 11/25/2023 22:59:51 - INFO - main - Num Epochs = 80 11/25/2023 22:59:51 - INFO - main - Instantaneous batch size per device = 1 11/25/2023 22:59:51 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 1 11/25/2023 22:59:51 - INFO - main - Gradient Accumulation steps = 1 11/25/2023 22:59:51 - INFO - main - Total optimization steps = 400 Steps: 0%| | 0/400 [00:00<?, ?it/s]Traceback (most recent call last): File "G:\dreambooth\diffusers\examples\dreambooth\train_dreambooth.py", line 1422, in main(args) File "G:\dreambooth\diffusers\examples\dreambooth\train_dreambooth.py", line 1253, in main model_pred = unet( File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "G:\dreambooth\diffusers\src\diffusers\models\unet_2d_condition.py", line 1035, in forward sample = self.conv_in(sample) File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward return self._conv_forward(input, self.weight, self.bias) File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead Steps: 0%| | 0/400 [00:01<?, ?it/s] Traceback (most recent call last): File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\Scripts\accelerate.exe__main__.py", line 7, in File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 994, in launch_command simple_launcher(args) File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 636, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\python.exe', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=.\dog', '--output_dir=.\output', '--instance_prompt=a photo of sks dog', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=400']' returned non-zero exit status 1.

Reproduction

accelerate launch train_dreambooth.py --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" --instance_data_dir=".\dog" --output_dir=".\output" --instance_prompt="a photo of sks dog" --resolution=512 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=5e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=400

Logs

No response

System Info

diffusers version: 0.24.0.dev0
Platform: Windows-10-10.0.22631-SP0
Python version: 3.10.6
PyTorch version (GPU?): 2.1.1+cu121 (True)
Huggingface_hub version: 0.19.4
Transformers version: 4.35.2
Accelerate version: 0.24.1
xFormers version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

GPU: 3090TI 24GB VRAM

Who can help?

@sayakpaul @patrickvonplaten

TwinkyW commented 11 months ago

I fully re-created my environment, re-installed Cuda, etc. from scratch and I'm still getting the same error. Running the script with train_dreambooth_lora.py works fine.

Cuda version is 12.1

Full pip list:

Package Version Editable project location

absl-py 2.0.0 accelerate 0.24.1 bitsandbytes 0.41.0 cachetools 5.3.2 certifi 2022.12.7 charset-normalizer 2.1.1 chex 0.1.85 colorama 0.4.6 diffusers 0.24.0.dev0 G:\HuggingFace\diffusers etils 1.5.2 filelock 3.9.0 flax 0.7.5 fsspec 2023.10.0 ftfy 6.1.3 google-auth 2.23.4 google-auth-oauthlib 1.1.0 grpcio 1.59.3 huggingface-hub 0.19.4 idna 3.4 importlib-metadata 6.8.0 importlib-resources 6.1.1 jax 0.4.20 jaxlib 0.4.20 Jinja2 3.1.2 Markdown 3.5.1 markdown-it-py 3.0.0 MarkupSafe 2.1.3 mdurl 0.1.2 ml-dtypes 0.3.1 mpmath 1.3.0 msgpack 1.0.7 nest-asyncio 1.5.8 networkx 3.0 numpy 1.24.1 oauthlib 3.2.2 opt-einsum 3.3.0 optax 0.1.7 orbax-checkpoint 0.4.3 packaging 23.2 Pillow 9.3.0 pip 23.3.1 protobuf 4.23.4 psutil 5.9.6 pyasn1 0.5.1 pyasn1-modules 0.3.0 Pygments 2.17.2 PyYAML 6.0.1 regex 2023.10.3 requests 2.28.1 requests-oauthlib 1.3.1 rich 13.7.0 rsa 4.9 safetensors 0.4.0 scipy 1.11.4 setuptools 65.5.0 six 1.16.0 sympy 1.12 tensorboard 2.15.1 tensorboard-data-server 0.7.2 tensorstore 0.1.50 tokenizers 0.15.0 toolz 0.12.0 torch 2.1.1+cu121 torchaudio 2.1.1+cu121 torchvision 0.16.1+cu121 tqdm 4.66.1 transformers 4.35.2 typing_extensions 4.4.0 urllib3 1.26.13 wcwidth 0.2.12 Werkzeug 3.0.1 zipp 3.17.0

sayakpaul commented 11 months ago

Could it be the case that the dog directory for the instance images is corrupted?

TwinkyW commented 11 months ago

No, I was first working on my own pictures and I was getting the same error. I decided to test with the dog ones to make sure my images weren't corrupted.

sayakpaul commented 11 months ago

Unable to reproduce the error: https://colab.research.google.com/gist/sayakpaul/85416d709a2f721a8ee8eb24bc676ab6/scratchpad.ipynb.

TwinkyW commented 11 months ago

Any pointers of what I should double check in my setup for this error? Considering that the script works fine when I run it with train_dreambooth_lora.py, I might be missing something that is specifically used by train_dreambooth.py

sayakpaul commented 11 months ago

I provided a Colab Notebook above that runs the script that you're reporting to be producing errors. But in the Colab Notebook (as you'd notice), it runs fine. So, I am not sure what's going wrong on your end.

Utkarsh-quytech commented 10 months ago

Getting the same error on using train_dreambooth.py while running on local computer(but there is no issue while running in colab) 2/12/2023 14:02:49 - INFO - main - Distributed environment: DistributedType.NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda

Mixed precision type: fp16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors. {'prediction_type', 'sample_max_value', 'thresholding', 'timestep_spacing', 'dynamic_thresholding_ratio', 'variance_type', 'clip_sample_range'} was not found in config. Values will be initialized to default values. {'upcast_attention', 'encoder_hid_dim', 'resnet_skip_time_act', 'dual_cross_attention', 'addition_embed_type', 'use_linear_projection', 'attention_type', 'encoder_hid_dim_type', 'time_embedding_dim', 'resnet_time_scale_shift', 'resnet_out_scale_factor', 'timestep_post_act', 'time_embedding_act_fn', 'cross_attention_norm', 'only_cross_attention', 'addition_time_embed_dim', 'time_embedding_type', 'reverse_transformer_layers_per_block', 'conv_in_kernel', 'projection_class_embeddings_input_dim', 'num_class_embeds', 'class_embeddings_concat', 'mid_block_only_cross_attention', 'transformer_layers_per_block', 'num_attention_heads', 'class_embed_type', 'conv_out_kernel', 'time_cond_proj_dim', 'mid_block_type', 'dropout', 'addition_embed_type_num_heads'} was not found in config. Values will be initialized to default values. 12/12/2023 14:02:58 - INFO - main - Running training 12/12/2023 14:02:58 - INFO - main - Num examples = 200 12/12/2023 14:02:58 - INFO - main - Num batches each epoch = 200 12/12/2023 14:02:58 - INFO - main - Num Epochs = 4 12/12/2023 14:02:58 - INFO - main - Instantaneous batch size per device = 1 12/12/2023 14:02:58 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 1 12/12/2023 14:02:58 - INFO - main - Gradient Accumulation steps = 1 12/12/2023 14:02:58 - INFO - main - Total optimization steps = 800 Steps: 0%| | 0/800 [00:00<?, ?it/sT raceback (most recent call last): File "C:\Users\Utkarsh.Singh\Documents\dreambooth\diffusers\examples\dreambooth\train_dreambooth.py", line 1428, in main(args) File "C:\Users\Utkarsh.Singh\Documents\dreambooth\diffusers\examples\dreambooth\train_dreambooth.py", line 1258, in main model_pred = unet( ^^^^^ File "C:\Users\Utkarsh.Singh\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Utkarsh.Singh\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\utils\operations.py", line 680, in forward return model_forward(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Utkarsh.Singh\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\utils\operations.py", line 668, in call return convert_to_fp32(self.model_forward(args, kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Utkarsh.Singh\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\amp\autocast_mode.py", line 14, in decorate_autocast return func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Utkarsh.Singh\Documents\dreambooth\diffusers\src\diffusers\models\unet_2d_condition.py", line 1072, in forward sample = self.conv_in(sample) ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Utkarsh.Singh\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Utkarsh.Singh\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Utkarsh.Singh\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[2, 3, 512, 512] to have 4 channels, but got 3 channels instead Steps: 0%| | 0/800 [00:01<?, ?it/s] Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Users\Utkarsh.Singh\AppData\Local\Programs\Python\Python311\Scripts\accelerate.exe__main__.py", line 7, in File "C:\Users\Utkarsh.Singh\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "C:\Users\Utkarsh.Singh\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "C:\Users\Utkarsh.Singh\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\Utkarsh.Singh\AppData\Local\Programs\Python\Python311\python.exe', 'C:/Users/Utkarsh.Singh/Documents/dreambooth/diffusers/examples/dreambooth/train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=images', '--class_data_dir=class_images', '--output_dir=model', '--with_prior_preservation', '--prior_loss_weight=1.0', "--instance_prompt='a photo Utkarsh'", "--class_prompt='Person'", '--resolution=512', '--train_batch_size=1', '--sample_batch_size=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--max_train_steps=800', '--push_to_hub']' returned non-zero exit status 1. Traceback (most recent call last): File "c:\Users\Utkarsh.Singh\Documents\dreambooth\diffusers\examples\dreambooth\train_test.py", line 37, in subprocess.run(command, check=True) File "C:\Users\Utkarsh.Singh\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['accelerate', 'launch', '--mixed_precision=fp16', 'C:/Users/Utkarsh.Singh/Documents/dreambooth/diffusers/examples/dreambooth/train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=images', '--class_data_dir=class_images', '--output_dir=model', '--with_prior_preservation', '--prior_loss_weight=1.0', "--instance_prompt='a photo Utkarsh'", "--class_prompt='Person'", '--resolution=512', '--train_batch_size=1', '--sample_batch_size=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--max_train_steps=800', '--push_to_hub']' returned non-zero exit status 1.

If someone finds a solution please help!!

andrewssdd commented 10 months ago

The issue is similar to the one in the Lora training script that was fixed with the pull request below.

https://github.com/huggingface/diffusers/pull/3462

I created a PR with a similar fix to the dreambooth checkpoint script.

github-actions[bot] commented 9 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / diffusers