Closed TwinkyW closed 9 months ago
I fully re-created my environment, re-installed Cuda, etc. from scratch and I'm still getting the same error. Running the script with train_dreambooth_lora.py works fine.
Cuda version is 12.1
Full pip list:
Package Version Editable project location
absl-py 2.0.0 accelerate 0.24.1 bitsandbytes 0.41.0 cachetools 5.3.2 certifi 2022.12.7 charset-normalizer 2.1.1 chex 0.1.85 colorama 0.4.6 diffusers 0.24.0.dev0 G:\HuggingFace\diffusers etils 1.5.2 filelock 3.9.0 flax 0.7.5 fsspec 2023.10.0 ftfy 6.1.3 google-auth 2.23.4 google-auth-oauthlib 1.1.0 grpcio 1.59.3 huggingface-hub 0.19.4 idna 3.4 importlib-metadata 6.8.0 importlib-resources 6.1.1 jax 0.4.20 jaxlib 0.4.20 Jinja2 3.1.2 Markdown 3.5.1 markdown-it-py 3.0.0 MarkupSafe 2.1.3 mdurl 0.1.2 ml-dtypes 0.3.1 mpmath 1.3.0 msgpack 1.0.7 nest-asyncio 1.5.8 networkx 3.0 numpy 1.24.1 oauthlib 3.2.2 opt-einsum 3.3.0 optax 0.1.7 orbax-checkpoint 0.4.3 packaging 23.2 Pillow 9.3.0 pip 23.3.1 protobuf 4.23.4 psutil 5.9.6 pyasn1 0.5.1 pyasn1-modules 0.3.0 Pygments 2.17.2 PyYAML 6.0.1 regex 2023.10.3 requests 2.28.1 requests-oauthlib 1.3.1 rich 13.7.0 rsa 4.9 safetensors 0.4.0 scipy 1.11.4 setuptools 65.5.0 six 1.16.0 sympy 1.12 tensorboard 2.15.1 tensorboard-data-server 0.7.2 tensorstore 0.1.50 tokenizers 0.15.0 toolz 0.12.0 torch 2.1.1+cu121 torchaudio 2.1.1+cu121 torchvision 0.16.1+cu121 tqdm 4.66.1 transformers 4.35.2 typing_extensions 4.4.0 urllib3 1.26.13 wcwidth 0.2.12 Werkzeug 3.0.1 zipp 3.17.0
Could it be the case that the dog
directory for the instance images is corrupted?
No, I was first working on my own pictures and I was getting the same error. I decided to test with the dog ones to make sure my images weren't corrupted.
Unable to reproduce the error: https://colab.research.google.com/gist/sayakpaul/85416d709a2f721a8ee8eb24bc676ab6/scratchpad.ipynb.
Any pointers of what I should double check in my setup for this error? Considering that the script works fine when I run it with train_dreambooth_lora.py, I might be missing something that is specifically used by train_dreambooth.py
I provided a Colab Notebook above that runs the script that you're reporting to be producing errors. But in the Colab Notebook (as you'd notice), it runs fine. So, I am not sure what's going wrong on your end.
Getting the same error on using train_dreambooth.py while running on local computer(but there is no issue while running in colab) 2/12/2023 14:02:49 - INFO - main - Distributed environment: DistributedType.NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda
Mixed precision type: fp16
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'prediction_type', 'sample_max_value', 'thresholding', 'timestep_spacing', 'dynamic_thresholding_ratio', 'variance_type', 'clip_sample_range'} was not found in config. Values will be initialized to default values.
{'upcast_attention', 'encoder_hid_dim', 'resnet_skip_time_act', 'dual_cross_attention', 'addition_embed_type', 'use_linear_projection', 'attention_type', 'encoder_hid_dim_type', 'time_embedding_dim', 'resnet_time_scale_shift', 'resnet_out_scale_factor', 'timestep_post_act', 'time_embedding_act_fn', 'cross_attention_norm', 'only_cross_attention', 'addition_time_embed_dim', 'time_embedding_type', 'reverse_transformer_layers_per_block', 'conv_in_kernel', 'projection_class_embeddings_input_dim', 'num_class_embeds', 'class_embeddings_concat', 'mid_block_only_cross_attention', 'transformer_layers_per_block', 'num_attention_heads', 'class_embed_type', 'conv_out_kernel', 'time_cond_proj_dim', 'mid_block_type', 'dropout', 'addition_embed_type_num_heads'} was not found in config. Values will be initialized to default values.
12/12/2023 14:02:58 - INFO - main - Running training
12/12/2023 14:02:58 - INFO - main - Num examples = 200
12/12/2023 14:02:58 - INFO - main - Num batches each epoch = 200
12/12/2023 14:02:58 - INFO - main - Num Epochs = 4
12/12/2023 14:02:58 - INFO - main - Instantaneous batch size per device = 1
12/12/2023 14:02:58 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 1
12/12/2023 14:02:58 - INFO - main - Gradient Accumulation steps = 1
12/12/2023 14:02:58 - INFO - main - Total optimization steps = 800
Steps: 0%| | 0/800 [00:00<?, ?it/sT
raceback (most recent call last):
File "C:\Users\Utkarsh.Singh\Documents\dreambooth\diffusers\examples\dreambooth\train_dreambooth.py", line 1428, in
If someone finds a solution please help!!
The issue is similar to the one in the Lora training script that was fixed with the pull request below.
https://github.com/huggingface/diffusers/pull/3462
I created a PR with a similar fix to the dreambooth checkpoint script.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Describe the bug
I'm trying to follow the Dreambooth training example and I'm getting this error:
RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead
This is the full window:
accelerate launch train_dreambooth.py --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" --instance_data_dir=".\dog" --output_dir=".\output" --instance_prompt="a photo of sks dog" --resolution=512 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=5e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=400 11/25/2023 22:59:47 - INFO - main - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda
Mixed precision type: no
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors. {'clip_sample_range', 'timestep_spacing', 'prediction_type', 'variance_type', 'thresholding', 'sample_max_value', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values. {'mid_block_only_cross_attention', 'conv_in_kernel', 'dual_cross_attention', 'resnet_time_scale_shift', 'time_embedding_type', 'transformer_layers_per_block', 'class_embed_type', 'use_linear_projection', 'cross_attention_norm', 'num_attention_heads', 'conv_out_kernel', 'encoder_hid_dim', 'projection_class_embeddings_input_dim', 'time_embedding_dim', 'addition_embed_type_num_heads', 'mid_block_type', 'class_embeddings_concat', 'upcast_attention', 'addition_time_embed_dim', 'only_cross_attention', 'encoder_hid_dim_type', 'resnet_out_scale_factor', 'dropout', 'time_embedding_act_fn', 'timestep_post_act', 'reverse_transformer_layers_per_block', 'resnet_skip_time_act', 'time_cond_proj_dim', 'attention_type', 'addition_embed_type', 'num_class_embeds'} was not found in config. Values will be initialized to default values. 11/25/2023 22:59:51 - INFO - main - Running training 11/25/2023 22:59:51 - INFO - main - Num examples = 5 11/25/2023 22:59:51 - INFO - main - Num batches each epoch = 5 11/25/2023 22:59:51 - INFO - main - Num Epochs = 80 11/25/2023 22:59:51 - INFO - main - Instantaneous batch size per device = 1 11/25/2023 22:59:51 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 1 11/25/2023 22:59:51 - INFO - main - Gradient Accumulation steps = 1 11/25/2023 22:59:51 - INFO - main - Total optimization steps = 400 Steps: 0%| | 0/400 [00:00<?, ?it/s]Traceback (most recent call last): File "G:\dreambooth\diffusers\examples\dreambooth\train_dreambooth.py", line 1422, in
main(args)
File "G:\dreambooth\diffusers\examples\dreambooth\train_dreambooth.py", line 1253, in main
model_pred = unet(
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, *kwargs)
File "G:\dreambooth\diffusers\src\diffusers\models\unet_2d_condition.py", line 1035, in forward
sample = self.conv_in(sample)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead
Steps: 0%| | 0/400 [00:01<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\Scripts\accelerate.exe__main__.py", line 7, in
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 994, in launch_command
simple_launcher(args)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 636, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\python.exe', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=.\dog', '--output_dir=.\output', '--instance_prompt=a photo of sks dog', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=400']' returned non-zero exit status 1.
Reproduction
accelerate launch train_dreambooth.py --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" --instance_data_dir=".\dog" --output_dir=".\output" --instance_prompt="a photo of sks dog" --resolution=512 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=5e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=400
Logs
No response
System Info
diffusers
version: 0.24.0.dev0GPU: 3090TI 24GB VRAM
Who can help?
@sayakpaul @patrickvonplaten