huggingface / diffusers

πŸ€— Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.25k stars 5.23k forks source link

Dreambooth Flux training error: RuntimeError: mat2 must be a matrix, got 1-D tensor #9497

Open kopyl opened 3 hours ago

kopyl commented 3 hours ago

Describe the bug

I run the training but get this error

Reproduction

Run accelerate config

compute_environment: LOCAL_MACHINE
debug: true
distributed_type: FSDP
downcast_bf16: 'no'
enable_cpu_affinity: false
fsdp_config:
  fsdp_activation_checkpointing: true
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_cpu_ram_efficient_loading: true
  fsdp_forward_prefetch: true
  fsdp_offload_params: true
  fsdp_sharding_strategy: HYBRID_SHARD_ZERO2
  fsdp_state_dict_type: SHARDED_STATE_DICT
  fsdp_sync_module_states: true
  fsdp_use_orig_params: true
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
!git clone https://github.com/huggingface/diffusers
%cd diffusers

%cd diffusers

!pip install -e .
!pip install -r examples/dreambooth/requirements_flux.txt
!pip install prodigyopt

import huggingface_hub
huggingface_hub.notebook_login()

MODEL_NAME="black-forest-labs/FLUX.1-dev"
INSTANCE_DIR="/dreambooth-datasets/yaremovaa"
OUTPUT_DIR="/flux-dreambooth-outputs/dreamboot-yaremovaa"

!accelerate launch examples/dreambooth/train_dreambooth_flux.py \
  --pretrained_model_name_or_path={MODEL_NAME}  \
  --instance_data_dir={INSTANCE_DIR} \
  --output_dir={OUTPUT_DIR} \
  --mixed_precision="bf16" \
  --instance_prompt="a photo of sks girl" \
  --resolution=512 \
  --train_batch_size=1 \
  --guidance_scale=1 \
  --gradient_accumulation_steps=4 \
  --optimizer="prodigy" \
  --learning_rate=1. \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=1000 \
  --validation_prompt="A photo of sks girl posing in a photo studio" \
  --validation_epochs=25 \
  --seed="0"

Logs

09/23/2024 13:15:53 - INFO - __main__ - Distributed environment: FSDP  Backend: nccl
Num processes: 2
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: bf16

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
09/23/2024 13:15:53 - INFO - __main__ - Distributed environment: FSDP  Backend: nccl
Num processes: 2
Process index: 1
Local process index: 1
Device: cuda:1

Mixed precision type: bf16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type t5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 18436.50it/s]
Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 16946.68it/s]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 12.08it/s]
Fetching 3 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 10433.59it/s]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00,  1.06s/it]
Fetching 3 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 10903.74it/s]
{'axes_dims_rope'} was not found in config. Values will be initialized to default values.
Using decoupled weight decay
Using decoupled weight decay
09/23/2024 13:16:15 - INFO - __main__ - ***** Running training *****
09/23/2024 13:16:15 - INFO - __main__ -   Num examples = 10
09/23/2024 13:16:15 - INFO - __main__ -   Num batches each epoch = 5
09/23/2024 13:16:15 - INFO - __main__ -   Num Epochs = 500
09/23/2024 13:16:15 - INFO - __main__ -   Instantaneous batch size per device = 1
09/23/2024 13:16:15 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 8
09/23/2024 13:16:15 - INFO - __main__ -   Gradient Accumulation steps = 4
09/23/2024 13:16:15 - INFO - __main__ -   Total optimization steps = 1000
Steps:   0%|                                           | 0/1000 [00:00<?, ?it/s]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
  with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context:  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
  with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context:  # type: ignore[attr-defined]
Steps:   0%|                           | 0/1000 [00:29<?, ?it/s, loss=0.4, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|                         | 0/1000 [00:31<?, ?it/s, loss=0.416, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|                         | 0/1000 [00:32<?, ?it/s, loss=0.327, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|              | 1/1000 [01:30<25:01:16, 90.17s/it, loss=0.592, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|                | 2/1000 [02:31<20:13:45, 72.97s/it, loss=nan, lr=1]
Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 20068.44it/s]

Loading checkpoint shards:   0%|                          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ         | 1/2 [00:02<00:02,  2.53s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:04<00:00,  2.43s/it]

Loading pipeline components...:   0%|                     | 0/7 [00:00<?, ?it/s]Loaded tokenizer_2 as T5TokenizerFast from `tokenizer_2` subfolder of black-forest-labs/FLUX.1-dev.

Loading pipeline components...:  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ       | 3/7 [00:00<00:00, 18.63it/s]Loaded scheduler as FlowMatchEulerDiscreteScheduler from `scheduler` subfolder of black-forest-labs/FLUX.1-dev.
Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of black-forest-labs/FLUX.1-dev.
Loading pipeline components...: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:00<00:00, 34.91it/s]
09/23/2024 13:18:51 - INFO - __main__ - Running validation... 
 Generating 4 images with prompt: A photo of sks girl posing in a photo studio.
[rank0]: Traceback (most recent call last):
[rank0]:   File "examples/dreambooth/train_dreambooth_flux.py", line 1791, in <module>
[rank0]:     main(args)
[rank0]:   File "examples/dreambooth/train_dreambooth_flux.py", line 1715, in main
[rank0]:     images = log_validation(
[rank0]:   File "examples/dreambooth/train_dreambooth_flux.py", line 173, in log_validation
[rank0]:     images = [pipeline(**pipeline_args, generator=generator).images[0] for _ in range(args.num_validation_images)]
[rank0]:   File "examples/dreambooth/train_dreambooth_flux.py", line 173, in <listcomp>
[rank0]:     images = [pipeline(**pipeline_args, generator=generator).images[0] for _ in range(args.num_validation_images)]
[rank0]:   File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 719, in __call__
[rank0]:     noise_pred = self.transformer(
[rank0]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.8/dist-packages/accelerate/utils/operations.py", line 820, in forward
[rank0]:     return model_forward(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.8/dist-packages/accelerate/utils/operations.py", line 808, in __call__
[rank0]:     return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]:   File "/usr/local/lib/python3.8/dist-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/diffusers/src/diffusers/models/transformers/transformer_flux.py", line 442, in forward
[rank0]:     hidden_states = self.x_embedder(hidden_states)
[rank0]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 117, in forward
[rank0]:     return F.linear(input, self.weight, self.bias)
[rank0]: RuntimeError: mat2 must be a matrix, got 1-D tensor
Steps:   0%|                | 2/1000 [02:57<24:35:32, 88.71s/it, loss=nan, lr=1]
[rank0]:[W923 13:19:13.356135687 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
W0923 13:19:21.013317 140632834377536 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 13534 closing signal SIGTERM
E0923 13:19:28.343540 140632834377536 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 13533) of binary: /usr/bin/python
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1161, in launch_command
    multi_gpu_launcher(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 799, in multi_gpu_launcher
    distrib_run.run(args)
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 892, in run
    elastic_launch(
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 133, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
examples/dreambooth/train_dreambooth_flux.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-09-23_13:19:21
  host      : x2-h100.internal.cloudapp.net
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 13533)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

System Info

Ubuntu 20.04 x2 NVIDIA H100 CUDA 12.2 torch==2.4.1 torchvision==0.19.1 Diffusers commit: https://github.com/huggingface/diffusers/commit/ba5af5aebbac0cc18168076a18836f175753d1c7

Who can help?

No response

kopyl commented 2 hours ago

If you remove these args:

--validation_prompt
--validation_epochs

Then the training does not stop.