huggingface / diffusers

πŸ€— Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.02k stars 5.17k forks source link

examples/text_to_image RuntimeError: "slow_conv2d_cpu" not implemented for 'Half' #3432

Closed zdwolfe closed 1 year ago

zdwolfe commented 1 year ago

Describe the bug

Following the examples/text_to_image README leads to a reproducible RuntimeError. RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'.

https://github.com/huggingface/diffusers/tree/main/examples/text_to_image

Reproduction

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .
cd examples/text_to_image/
accelerate config
cat ~/.cache/huggingface/accelerate/default_config.yaml 
compute_environment: LOCAL_MACHINE
distributed_type: 'NO'
downcast_bf16: 'no'
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
huggingface-cli login
...
Login successful
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export dataset_name="lambdalabs/pokemon-blip-captions"

accelerate launch --mixed_precision="fp16"  train_text_to_image.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$dataset_name \
  --use_ema \
  --resolution=512 --center_crop --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --gradient_checkpointing \
  --max_train_steps=15000 \
  --learning_rate=1e-05 \
  --max_grad_norm=1 \
  --lr_scheduler="constant" --lr_warmup_steps=0 \
  --output_dir="sd-pokemon-model" 

Results in

RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
Steps:   0%|                                                                                                                                                                                                                                                                        | 0/15000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code

Logs

C:\Users\wolfe\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\accelerator.py:258: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of πŸ€— Accelerate. Use `project_dir` instead.
  warnings.warn(
05/14/2023 16:15:20 - INFO - __main__ - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cpu

Mixed precision type: fp16

{'dynamic_thresholding_ratio', 'prediction_type', 'thresholding', 'sample_max_value', 'variance_type', 'clip_sample_range'} was not found in config. Values will be initialized to default values.
{'norm_num_groups'} was not found in config. Values will be initialized to default values.
{'num_class_embeds', 'conv_out_kernel', 'mid_block_only_cross_attention', 'time_embedding_dim', 'timestep_post_act', 'resnet_time_scale_shift', 'use_linear_projection', 'projection_class_embeddings_input_dim', 'upcast_attention', 'cross_attention_norm', 'time_cond_proj_dim', 'addition_embed_type_num_heads', 'dual_cross_attention', 'class_embed_type', 'resnet_skip_time_act', 'time_embedding_type', 'resnet_out_scale_factor', 'only_cross_attention', 'class_embeddings_concat', 'conv_in_kernel', 'encoder_hid_dim', 'mid_block_type', 'time_embedding_act_fn', 'addition_embed_type'} was not found in config. Values will be initialized to default values.
{'num_class_embeds', 'conv_out_kernel', 'mid_block_only_cross_attention', 'time_embedding_dim', 'timestep_post_act', 'resnet_time_scale_shift', 'use_linear_projection', 'projection_class_embeddings_input_dim', 'upcast_attention', 'cross_attention_norm', 'time_cond_proj_dim', 'addition_embed_type_num_heads', 'dual_cross_attention', 'class_embed_type', 'resnet_skip_time_act', 'time_embedding_type', 'resnet_out_scale_factor', 'only_cross_attention', 'class_embeddings_concat', 'conv_in_kernel', 'encoder_hid_dim', 'mid_block_type', 'time_embedding_act_fn', 'addition_embed_type'} was not found in config. Values will be initialized to default values.
05/14/2023 16:15:24 - WARNING - datasets.builder - Found cached dataset parquet (C:/Users/wolfe/.cache/huggingface/datasets/lambdalabs___parquet/lambdalabs--pokemon-blip-captions-10e3527a764857bd/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 250.03it/s] 
05/14/2023 16:15:25 - INFO - __main__ - ***** Running training *****
05/14/2023 16:15:25 - INFO - __main__ -   Num examples = 833
05/14/2023 16:15:25 - INFO - __main__ -   Num Epochs = 72
05/14/2023 16:15:25 - INFO - __main__ -   Instantaneous batch size per device = 1
05/14/2023 16:15:25 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
05/14/2023 16:15:25 - INFO - __main__ -   Gradient Accumulation steps = 4
05/14/2023 16:15:25 - INFO - __main__ -   Total optimization steps = 15000
Steps:   0%|                                                                                                                                                                                                                                                                        | 0/15000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "C:\Users\wolfe\src\genai\diffusers\examples\text_to_image\train_text_to_image.py", line 959, in <module>
    main()
  File "C:\Users\wolfe\src\genai\diffusers\examples\text_to_image\train_text_to_image.py", line 823, in main
    latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\wolfe\AppData\Local\Programs\Python\Python311\Lib\site-packages\diffusers\utils\accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\wolfe\AppData\Local\Programs\Python\Python311\Lib\site-packages\diffusers\models\autoencoder_kl.py", line 164, in encode
    h = self.encoder(x)
        ^^^^^^^^^^^^^^^
  File "C:\Users\wolfe\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\wolfe\AppData\Local\Programs\Python\Python311\Lib\site-packages\diffusers\models\vae.py", line 109, in forward
    sample = self.conv_in(sample)
             ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\wolfe\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\wolfe\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\wolfe\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
Steps:   0%|                                                                                                                                                                                                                                                                        | 0/15000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\wolfe\AppData\Local\Programs\Python\Python311\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\Users\wolfe\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "C:\Users\wolfe\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\commands\launch.py", line 918, in launch_command
    simple_launcher(args)
  File "C:\Users\wolfe\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\commands\launch.py", line 580, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\wolfe\\AppData\\Local\\Programs\\Python\\Python311\\python.exe', 'train_text_to_image.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--dataset_name=lambdalabs/pokemon-blip-captions', '--use_ema', '--resolution=512', '--center_crop', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--gradient_checkpointing', '--max_train_steps=15000', '--learning_rate=1e-05', '--max_grad_norm=1', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--output_dir=sd-pokemon-model']' returned non-zero exit status 1.   

System Info

- `diffusers` version: 0.17.0.dev0
- Platform: Windows-10-10.0.22621-SP0
- Python version: 3.11.3
- PyTorch version (GPU?): 2.0.1+cpu (False)
- Huggingface_hub version: 0.14.1
- Transformers version: 4.29.1
- Accelerate version: 0.19.0
- xFormers version: not installed
- Using GPU in script?: all (RTX 4090)
- Using distributed or parallel set-up in script?: no
tim-tmds commented 1 year ago

https://github.com/ultralytics/yolov5/issues/10379#issuecomment-1335865572 your issue might be similar to above

zdwolfe commented 1 year ago

Thank you @tim-tmds. Looking at https://github.com/ultralytics/yolov5/issues/10379#issuecomment-1335865572 I don't believe that's the case. I followed the README instructions for installation and am not intending to use a CPU for training.

sayakpaul commented 1 year ago

From your diffusers-cli env:

PyTorch version (GPU?): 2.0.1+cpu (False)

Could you ensure PyTorch has been installed correctly?

zdwolfe commented 1 year ago

I followed the instructions on the README (pip install -r requirements.txt). Is there something else I should have done? Thanks!

On Thu, Jun 1, 2023 at 5:33β€―AM Sayak Paul @.***> wrote:

From your diffusers-cli env:

PyTorch version (GPU?): 2.0.1+cpu (False)

Could you ensure PyTorch has been installed correctly?

β€” Reply to this email directly, view it on GitHub https://github.com/huggingface/diffusers/issues/3432#issuecomment-1571970658, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFPW4VGLF7XAOC5HPFZAZTXJCD3HANCNFSM6AAAAAAYBOILMY . You are receiving this because you authored the thread.Message ID: @.***>

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

woniesong92 commented 10 months ago

Experiencing the same problem

sayakpaul commented 10 months ago

You're probably using FP16 on a CPU.

lskckkvvks commented 6 months ago

Following the tutorial.ipynb from https://github.com/ultralytics/yolov5, I meet similar problem.