Error when use train_text_to_image_lora.py

ShunL12324 commented 1 year ago

Describe the bug

Got following error when try to use train_text_to_image_lora.py.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

here is full log: https://pastebin.com/Mjy5yKHe

Reproduction

just run the train_text_to_image_lora.py script, see log.

Logs

(.env) ➜  lora git:(main) accelerate launch --mixed_precision="fp16"  train_text_to_image_lora.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
  --dataset_name="$HOME/DATA" \ 
  --dataloader_num_workers=8 \
  --resolution=512 --center_crop --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=15000 \
  --learning_rate=1e-04 \
  --max_grad_norm=1 \
  --lr_scheduler="cosine" --lr_warmup_steps=0 \
  --output_dir="$HOME/OUTPUT" 
06/28/2023 02:52:33 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

Downloading (…)cheduler_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 308/308 [00:00<00:00, 2.33MB/s]
{'dynamic_thresholding_ratio', 'clip_sample_range', 'thresholding', 'variance_type', 'prediction_type', 'sample_max_value'} was not found in config. Values will be initialized to default values.
Downloading (…)tokenizer/vocab.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 1.82MB/s]
Downloading (…)tokenizer/merges.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 1.24MB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 472/472 [00:00<00:00, 5.02MB/s]
Downloading (…)okenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 806/806 [00:00<00:00, 8.90MB/s]
Downloading (…)_encoder/config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 617/617 [00:00<00:00, 4.87MB/s]
Downloading model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 492M/492M [00:38<00:00, 12.8MB/s]
Downloading (…)main/vae/config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 547/547 [00:00<00:00, 4.29MB/s]
Downloading (…)ch_model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 335M/335M [00:25<00:00, 13.0MB/s]
{'scaling_factor'} was not found in config. Values will be initialized to default values.
Downloading (…)ain/unet/config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 743/743 [00:00<00:00, 6.25MB/s]
Downloading (…)ch_model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.44G/3.44G [04:40<00:00, 12.2MB/s]
{'encoder_hid_dim', 'dual_cross_attention', 'conv_out_kernel', 'class_embed_type', 'mid_block_only_cross_attention', 'projection_class_embeddings_input_dim', 'cross_attention_norm', 'resnet_out_scale_factor', 'encoder_hid_dim_type', 'resnet_time_scale_shift', 'time_cond_proj_dim', 'time_embedding_act_fn', 'time_embedding_dim', 'num_class_embeds', 'timestep_post_act', 'use_linear_projection', 'conv_in_kernel', 'time_embedding_type', 'class_embeddings_concat', 'resnet_skip_time_act', 'only_cross_attention', 'mid_block_type', 'upcast_attention', 'addition_embed_type_num_heads', 'num_attention_heads', 'addition_embed_type'} was not found in config. Values will be initialized to default values.
06/28/2023 02:58:28 - WARNING - datasets.builder - Found cached dataset imagefolder (/root/.cache/huggingface/datasets/imagefolder/DATA-44b786d47deae2fb/0.0.0/37fbb85cc714a338bea574ac6c7d0b5be5aff46c1862c1989b20e0771199e93f)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 701.98it/s]
06/28/2023 02:58:29 - INFO - __main__ - ***** Running training *****
06/28/2023 02:58:29 - INFO - __main__ -   Num examples = 7
06/28/2023 02:58:29 - INFO - __main__ -   Num Epochs = 7500
06/28/2023 02:58:29 - INFO - __main__ -   Instantaneous batch size per device = 1
06/28/2023 02:58:29 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
06/28/2023 02:58:29 - INFO - __main__ -   Gradient Accumulation steps = 4
06/28/2023 02:58:29 - INFO - __main__ -   Total optimization steps = 15000
Steps:   0%|                                                                                                                                                                                                                                                            | 0/15000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/root/diffusers/examples/research_projects/lora/train_text_to_image_lora.py", line 1014, in <module>
    main()
  File "/root/diffusers/examples/research_projects/lora/train_text_to_image_lora.py", line 817, in main
    model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/root/diffusers/.env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/diffusers/.env/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 760, in forward
    emb = self.time_embedding(t_emb, timestep_cond)
  File "/root/diffusers/.env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/diffusers/.env/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 192, in forward
    sample = self.linear_1(sample)
  File "/root/diffusers/.env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/diffusers/.env/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)
Steps:   0%|                                                                                                                                                                                                                                                            | 0/15000 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "/root/diffusers/.env/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/diffusers/.env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/root/diffusers/.env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 941, in launch_command
    simple_launcher(args)
  File "/root/diffusers/.env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 603, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/diffusers/.env/bin/python3', 'train_text_to_image_lora.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--dataset_name=/root/DATA', '--dataloader_num_workers=8', '--resolution=512', '--center_crop', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--max_train_steps=15000', '--learning_rate=1e-04', '--max_grad_norm=1', '--lr_scheduler=cosine', '--lr_warmup_steps=0', '--output_dir=/root/OUTPUT']' returned non-zero exit status 1.

System Info

diffuser version: installed from source system: ubuntu 22.04 cuda: 11.8

Who can help?

@williamberman, @sayakpaul, @yiyixuxu

patrickvonplaten commented 1 year ago

cc @sayakpaul

sayakpaul commented 1 year ago

I am unable to reproduce this.

My environment is as follows:

- `diffusers` version: 0.18.0.dev0
- Platform: Linux-4.19.0-24-cloud-amd64-x86_64-with-glibc2.10
- Python version: 3.8.16
- PyTorch version (GPU?): 1.13.1+cu116 (True)
- Huggingface_hub version: 0.13.2
- Transformers version: 4.26.1
- Accelerate version: 0.18.0
- xFormers version: 0.0.16
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No

diffusers was installed like so:

pip install git+https://github.com/huggingface/diffusers

I used the following commands to launch training:

export DATASET_NAME="lambdalabs/pokemon-blip-captions"
export MODEL_NAME="runwayml/stable-diffusion-v1-5"

accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
   --pretrained_model_name_or_path=$MODEL_NAME \
   --dataset_name=$DATASET_NAME --caption_column="text" \
   --resolution=512 --random_flip \
   --train_batch_size=1 \
   --num_train_epochs=100 --checkpointing_steps=5000 \
   --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
   --seed=42 \
   --output_dir="sd-pokemon-model-lora"

What am I missing out on?

ShunL12324 commented 1 year ago

I did following steps:

create an instance in google cloud with an V100 GPU, Ubuntu 22.04

install cuda use script from Nvidia:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

After installation:


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P0    24W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+


3. clone diffusers and create an python virtual env,  installed torch, diffusers etc.

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install git+https://github.com/huggingface/diffusers cd "$HOME/diffusers/examples/research_projects/lora" || exit pip install -r requirements.txt pip install safetensors pip install omegaconf pip install accelerate accelerate config

In which compute environment are you running? This machine

Which type of machine are you using?
No distributed training
Do you want to run your training on CPU only (even if a GPU / Apple Silicon device is available)? [yes/NO]:NO
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
Do you want to use DeepSpeed? [yes/NO]: NO
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:

Do you wish to use FP16 or BF16 (mixed precision)? fp16


4. train, use exactly the same command

accelerate configuration saved at /home/shun/.cache/huggingface/accelerate/default_config.yaml
(.env) shun@instance-1:~/diffusers/examples/research_projects/lora$ export DATASET_NAME="lambdalabs/pokemon-blip-captions"
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
(.env) shun@instance-1:~/diffusers/examples/research_projects/lora$ accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --dataset_name=$DATASET_NAME --caption_column="text" \ --resolution=512 --random_flip \ --train_batch_size=1 \ --num_train_epochs=100 --checkpointing_steps=5000 \ --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \ --seed=42 \ --output_dir="sd-pokemon-model-lora" 06/29/2023 14:39:54 - INFO - main - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda

Mixed precision type: fp16

Downloading (…)cheduler_config.json: 100%|████████████████████████████████████████████████████████████████████████████████| 308/308 [00:00<00:00, 2.00MB/s] {'dynamic_thresholding_ratio', 'variance_type', 'clip_sample_range', 'prediction_type', 'sample_max_value', 'thresholding'} was not found in config. Values will be initialized to default values. Downloading (…)tokenizer/vocab.json: 100%|████████████████████████████████████████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 20.6MB/s] Downloading (…)tokenizer/merges.txt: 100%|███████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 146MB/s] Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████| 472/472 [00:00<00:00, 2.74MB/s] Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████| 806/806 [00:00<00:00, 4.10MB/s] Downloading (…)_encoder/config.json: 100%|████████████████████████████████████████████████████████████████████████████████| 617/617 [00:00<00:00, 3.11MB/s] Downloading model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████| 492M/492M [00:06<00:00, 77.1MB/s] Downloading (…)main/vae/config.json: 100%|████████████████████████████████████████████████████████████████████████████████| 547/547 [00:00<00:00, 3.19MB/s] Downloading (…)ch_model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████| 335M/335M [00:04<00:00, 73.0MB/s] {'scaling_factor'} was not found in config. Values will be initialized to default values. Downloading (…)ain/unet/config.json: 100%|████████████████████████████████████████████████████████████████████████████████| 743/743 [00:00<00:00, 5.00MB/s] Downloading (…)ch_model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████| 3.44G/3.44G [00:44<00:00, 77.2MB/s] {'use_linear_projection', 'time_embedding_act_fn', 'class_embeddings_concat', 'num_attention_heads', 'cross_attention_norm', 'only_cross_attention', 'conv_in_kernel', 'time_cond_proj_dim', 'conv_out_kernel', 'upcast_attention', 'mid_block_only_cross_attention', 'resnet_time_scale_shift', 'time_embedding_dim', 'projection_class_embeddings_input_dim', 'mid_block_type', 'addition_embed_type', 'resnet_skip_time_act', 'num_class_embeds', 'dual_cross_attention', 'class_embed_type', 'resnet_out_scale_factor', 'time_embedding_type', 'timestep_post_act', 'addition_embed_type_num_heads', 'encoder_hid_dim_type', 'encoder_hid_dim'} was not found in config. Values will be initialized to default values. Downloading metadata: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 731/731 [00:00<00:00, 825kB/s] Downloading readme: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1.80k/1.80k [00:00<00:00, 1.63MB/s] Downloading and preparing dataset imagefolder/pokemon (download: 95.05 MiB, generated: 113.89 MiB, post-processed: Unknown size, total: 208.94 MiB) to /home/shun/.cache/huggingface/datasets/lambdalabsparquet/lambdalabs--pokemon-blip-captions-10e3527a764857bd/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7... Downloading data: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 99.7M/99.7M [00:00<00:00, 124MB/s] Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.39s/it] Extracting data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1693.30it/s] Dataset parquet downloaded and prepared to /home/shun/.cache/huggingface/datasets/lambdalabsparquet/lambdalabs--pokemon-blip-captions-10e3527a764857bd/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7. Subsequent calls will reuse this data. 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 737.27it/s] 06/29/2023 14:40:59 - INFO - main - Running training 06/29/2023 14:40:59 - INFO - main - Num examples = 833 06/29/2023 14:40:59 - INFO - main - Num Epochs = 100 06/29/2023 14:40:59 - INFO - main - Instantaneous batch size per device = 1 06/29/2023 14:40:59 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 1 06/29/2023 14:40:59 - INFO - main - Gradient Accumulation steps = 1 06/29/2023 14:40:59 - INFO - main - Total optimization steps = 83300 Steps: 0%| | 0/83300 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/shun/diffusers/examples/research_projects/lora/train_text_to_image_lora.py", line 1014, in main() File "/home/shun/diffusers/examples/research_projects/lora/train_text_to_image_lora.py", line 817, in main model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample File "/home/shun/diffusers/.env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/shun/diffusers/.env/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 760, in forward emb = self.time_embedding(t_emb, timestep_cond) File "/home/shun/diffusers/.env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/shun/diffusers/.env/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 192, in forward sample = self.linear_1(sample) File "/home/shun/diffusers/.env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/shun/diffusers/.env/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm) Steps: 0%| | 0/83300 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/shun/diffusers/.env/bin/accelerate", line 8, in sys.exit(main()) File "/home/shun/diffusers/.env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/home/shun/diffusers/.env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 941, in launch_command simple_launcher(args) File "/home/shun/diffusers/.env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 603, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/shun/diffusers/.env/bin/python3', 'train_text_to_image_lora.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--dataset_name=lambdalabs/pokemon-blip-captions', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--num_train_epochs=100', '--checkpointing_steps=5000', '--learning_rate=1e-04', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--seed=42', '--output_dir=sd-pokemon-model-lora']' returned non-zero exit status 1.


-------
- Python version: 3.10.6
- Linux instance-1 5.19.0-1026-gcp #28~22.04.1-Ubuntu SMP Tue Jun 6 07:24:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
- transformers version: 4.30.2
- diffusers version: 0.18.0.dev0
- torch version: 2.0.1+cu118
- huggingface_hub version: 0.15.1
- accelerate version: 0.20.3
- xformers: not install

------
so that could be a problem with torch version?
@sayakpaul

sayakpaul commented 1 year ago

Ah, so you're using Torch 2.0. Cool will investigate. Thanks so much for being so detailed.

ShunL12324 commented 1 year ago

Tested with torch 1.13.1 and cuda 116, still the same error 😂

Python 3.8.10 (default, May 26 2023, 14:05:08) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.13.1+cu116'
>>> torch.cuda.is_available()
True
>>>

Error:

accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
>    --pretrained_model_name_or_path=$MODEL_NAME \
>    --dataset_name=$DATASET_NAME --caption_column="text" \
>    --resolution=512 --random_flip \
>    --train_batch_size=1 \
>    --num_train_epochs=100 --checkpointing_steps=5000 \
>    --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
>    --seed=42 \
>    --output_dir="sd-pokemon-model-lora"
06/30/2023 05:43:43 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

Downloading (…)cheduler_config.json: 100%|████████████████████████████████████████████████████████████████████████████████| 308/308 [00:00<00:00, 34.7kB/s]
{'dynamic_thresholding_ratio', 'thresholding', 'prediction_type', 'variance_type', 'sample_max_value', 'clip_sample_range'} was not found in config. Values will be initialized to default values.
Downloading (…)tokenizer/vocab.json: 100%|████████████████████████████████████████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 18.2MB/s]
Downloading (…)tokenizer/merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 89.0MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████| 472/472 [00:00<00:00, 273kB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████| 806/806 [00:00<00:00, 477kB/s]
Downloading (…)_encoder/config.json: 100%|████████████████████████████████████████████████████████████████████████████████| 617/617 [00:00<00:00, 73.4kB/s]
Downloading model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████| 492M/492M [00:02<00:00, 195MB/s]
Downloading (…)main/vae/config.json: 100%|████████████████████████████████████████████████████████████████████████████████| 547/547 [00:00<00:00, 67.4kB/s]
Downloading (…)ch_model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████| 335M/335M [00:01<00:00, 201MB/s]
{'scaling_factor'} was not found in config. Values will be initialized to default values.
Downloading (…)ain/unet/config.json: 100%|█████████████████████████████████████████████████████████████████████████████████| 743/743 [00:00<00:00, 328kB/s]
Downloading (…)ch_model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████| 3.44G/3.44G [00:39<00:00, 86.9MB/s]
{'time_cond_proj_dim', 'conv_out_kernel', 'addition_embed_type_num_heads', 'projection_class_embeddings_input_dim', 'resnet_out_scale_factor', 'encoder_hid_dim', 'mid_block_type', 'cross_attention_norm', 'timestep_post_act', 'conv_in_kernel', 'time_embedding_act_fn', 'dual_cross_attention', 'only_cross_attention', 'upcast_attention', 'class_embed_type', 'resnet_skip_time_act', 'use_linear_projection', 'num_attention_heads', 'time_embedding_type', 'num_class_embeds', 'time_embedding_dim', 'class_embeddings_concat', 'encoder_hid_dim_type', 'mid_block_only_cross_attention', 'resnet_time_scale_shift', 'addition_embed_type'} was not found in config. Values will be initialized to default values.
Downloading metadata: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 731/731 [00:00<00:00, 4.82MB/s]
Downloading readme: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1.80k/1.80k [00:00<00:00, 11.1MB/s]
Downloading and preparing dataset imagefolder/pokemon (download: 95.05 MiB, generated: 113.89 MiB, post-processed: Unknown size, total: 208.94 MiB) to /home/shun/.cache/huggingface/datasets/lambdalabs___parquet/lambdalabs--pokemon-blip-captions-10e3527a764857bd/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7...
Downloading data: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 99.7M/99.7M [00:01<00:00, 68.2MB/s]
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.10s/it]
Extracting data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1600.88it/s]
Dataset parquet downloaded and prepared to /home/shun/.cache/huggingface/datasets/lambdalabs___parquet/lambdalabs--pokemon-blip-captions-10e3527a764857bd/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7. Subsequent calls will reuse this data.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 713.32it/s]
06/30/2023 05:44:37 - INFO - __main__ - ***** Running training *****
06/30/2023 05:44:37 - INFO - __main__ -   Num examples = 833
06/30/2023 05:44:37 - INFO - __main__ -   Num Epochs = 100
06/30/2023 05:44:37 - INFO - __main__ -   Instantaneous batch size per device = 1
06/30/2023 05:44:37 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
06/30/2023 05:44:37 - INFO - __main__ -   Gradient Accumulation steps = 1
06/30/2023 05:44:37 - INFO - __main__ -   Total optimization steps = 83300
Steps:   0%|                                                                                                                     | 0/83300 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train_text_to_image_lora.py", line 1014, in <module>
    main()
  File "train_text_to_image_lora.py", line 817, in main
    model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/diffusers/models/unet_2d_condition.py", line 765, in forward
    emb = self.time_embedding(t_emb, timestep_cond)
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/diffusers/models/embeddings.py", line 192, in forward
    sample = self.linear_1(sample)
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)
Steps:   0%|                                                                                                                     | 0/83300 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/home/shun/diffusers/.env/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/accelerate/commands/launch.py", line 941, in launch_command
    simple_launcher(args)
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/accelerate/commands/launch.py", line 603, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/shun/diffusers/.env/bin/python3', 'train_text_to_image_lora.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--dataset_name=lambdalabs/pokemon-blip-captions', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--num_train_epochs=100', '--checkpointing_steps=5000', '--learning_rate=1e-04', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--seed=42', '--output_dir=sd-pokemon-model-lora']' returned non-zero exit status 1.

Tried with no accelerate:

python3 train_text_to_image_lora.py    --pretrained_model_name_or_path=$MODEL_NAME    --dataset_name=$DATASET_NAME --caption_column="text"    --resolution=512 --random_flip    --train_batch_size=1    --num_train_epochs=100 --checkpointing_steps=5000    --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0    --seed=42    --output_dir="sd-pokemon-model-lora"
06/30/2023 05:50:24 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: no

{'sample_max_value', 'variance_type', 'thresholding', 'clip_sample_range', 'dynamic_thresholding_ratio', 'prediction_type'} was not found in config. Values will be initialized to default values.
{'scaling_factor'} was not found in config. Values will be initialized to default values.
{'class_embeddings_concat', 'class_embed_type', 'resnet_skip_time_act', 'use_linear_projection', 'time_embedding_act_fn', 'encoder_hid_dim', 'addition_embed_type', 'conv_out_kernel', 'resnet_time_scale_shift', 'time_cond_proj_dim', 'addition_embed_type_num_heads', 'time_embedding_type', 'resnet_out_scale_factor', 'num_class_embeds', 'conv_in_kernel', 'encoder_hid_dim_type', 'dual_cross_attention', 'timestep_post_act', 'time_embedding_dim', 'projection_class_embeddings_input_dim', 'cross_attention_norm', 'mid_block_only_cross_attention', 'upcast_attention', 'mid_block_type', 'only_cross_attention', 'num_attention_heads'} was not found in config. Values will be initialized to default values.
06/30/2023 05:50:29 - WARNING - datasets.builder - Found cached dataset parquet (/home/shun/.cache/huggingface/datasets/lambdalabs___parquet/lambdalabs--pokemon-blip-captions-10e3527a764857bd/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 661.04it/s]
06/30/2023 05:50:30 - INFO - __main__ - ***** Running training *****
06/30/2023 05:50:30 - INFO - __main__ -   Num examples = 833
06/30/2023 05:50:30 - INFO - __main__ -   Num Epochs = 100
06/30/2023 05:50:30 - INFO - __main__ -   Instantaneous batch size per device = 1
06/30/2023 05:50:30 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
06/30/2023 05:50:30 - INFO - __main__ -   Gradient Accumulation steps = 1
06/30/2023 05:50:30 - INFO - __main__ -   Total optimization steps = 83300
Steps:   0%|                                                                                                                     | 0/83300 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train_text_to_image_lora.py", line 1014, in <module>
    main()
  File "train_text_to_image_lora.py", line 817, in main
    model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/diffusers/models/unet_2d_condition.py", line 765, in forward
    emb = self.time_embedding(t_emb, timestep_cond)
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/diffusers/models/embeddings.py", line 192, in forward
    sample = self.linear_1(sample)
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/shun/diffusers/.env/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)
Steps:   0%|                                                                                                                     | 0/83300 [00:01<?, ?it/s]

Nvidia-smi output:

Fri Jun 30 05:53:01 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    22W / 300W |    108MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1025      G   /usr/lib/xorg/Xorg                 95MiB |
|    0   N/A  N/A      1132      G   /usr/bin/gnome-shell               12MiB |
+-----------------------------------------------------------------------------+

@sayakpaul

sayakpaul commented 1 year ago

Now this I cannot confirm as mentioned in https://github.com/huggingface/diffusers/issues/3884#issuecomment-1613007671. I would maybe suggest seeing if this error persists when you reinstall diffusers from source (pip install git+https://github.com/huggingface/diffusers).

Meanwhile I am looking into if this fails with PT 2.0.

sayakpaul commented 1 year ago

I am still unable to reproduce the bug even on PT 2.0. Check out this Colab Gist: https://colab.research.google.com/gist/sayakpaul/065dd9dd92bf41af954c5a18470e64eb/scratchpad.ipynb.

I set up the environment there and started the training from a Colab Terminal (requires a pro subscription). It went as expected.

@pcuenca if you have time, could you maybe see you're able to reproduce the bug reported? This is just to confirm I am not missing out on anything obvious.

ShunL12324 commented 1 year ago

Can you check this colab link to see what I did wrong? https://colab.research.google.com/drive/1qfugOTGcpg9RJDZjdwrgPnaYWObUeJUQ?usp=sharing

thanks for your time @sayakpaul

sayakpaul commented 1 year ago

Ah, you're using the LoRA script for the research project. Unfortunately, we don't maintain the directory. So, I am pinging @@haofanwang to check what they have to say.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / diffusers