09/23/2024 13:15:53 - INFO - __main__ - Distributed environment: FSDP Backend: nccl
Num processes: 2
Process index: 0
Local process index: 0
Device: cuda:0
Mixed precision type: bf16
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
09/23/2024 13:15:53 - INFO - __main__ - Distributed environment: FSDP Backend: nccl
Num processes: 2
Process index: 1
Local process index: 1
Device: cuda:1
Mixed precision type: bf16
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type t5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Downloading shards: 100%|ββββββββββββββββββββββ| 2/2 [00:00<00:00, 18436.50it/s]
Downloading shards: 100%|ββββββββββββββββββββββ| 2/2 [00:00<00:00, 16946.68it/s]
Loading checkpoint shards: 100%|ββββββββββββββββββ| 2/2 [00:00<00:00, 12.08it/s]
Fetching 3 files: 100%|ββββββββββββββββββββββββ| 3/3 [00:00<00:00, 10433.59it/s]
Loading checkpoint shards: 100%|ββββββββββββββββββ| 2/2 [00:02<00:00, 1.06s/it]
Fetching 3 files: 100%|ββββββββββββββββββββββββ| 3/3 [00:00<00:00, 10903.74it/s]
{'axes_dims_rope'} was not found in config. Values will be initialized to default values.
Using decoupled weight decay
Using decoupled weight decay
09/23/2024 13:16:15 - INFO - __main__ - ***** Running training *****
09/23/2024 13:16:15 - INFO - __main__ - Num examples = 10
09/23/2024 13:16:15 - INFO - __main__ - Num batches each epoch = 5
09/23/2024 13:16:15 - INFO - __main__ - Num Epochs = 500
09/23/2024 13:16:15 - INFO - __main__ - Instantaneous batch size per device = 1
09/23/2024 13:16:15 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 8
09/23/2024 13:16:15 - INFO - __main__ - Gradient Accumulation steps = 4
09/23/2024 13:16:15 - INFO - __main__ - Total optimization steps = 1000
Steps: 0%| | 0/1000 [00:00<?, ?it/s]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined]
Steps: 0%| | 0/1000 [00:29<?, ?it/s, loss=0.4, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps: 0%| | 0/1000 [00:31<?, ?it/s, loss=0.416, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps: 0%| | 0/1000 [00:32<?, ?it/s, loss=0.327, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps: 0%| | 1/1000 [01:30<25:01:16, 90.17s/it, loss=0.592, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps: 0%| | 2/1000 [02:31<20:13:45, 72.97s/it, loss=nan, lr=1]
Downloading shards: 100%|ββββββββββββββββββββββ| 2/2 [00:00<00:00, 20068.44it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|βββββββββ | 1/2 [00:02<00:02, 2.53s/it]
Loading checkpoint shards: 100%|ββββββββββββββββββ| 2/2 [00:04<00:00, 2.43s/it]
Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]Loaded tokenizer_2 as T5TokenizerFast from `tokenizer_2` subfolder of black-forest-labs/FLUX.1-dev.
Loading pipeline components...: 43%|ββββββ | 3/7 [00:00<00:00, 18.63it/s]Loaded scheduler as FlowMatchEulerDiscreteScheduler from `scheduler` subfolder of black-forest-labs/FLUX.1-dev.
Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of black-forest-labs/FLUX.1-dev.
Loading pipeline components...: 100%|βββββββββββββ| 7/7 [00:00<00:00, 34.91it/s]
09/23/2024 13:18:51 - INFO - __main__ - Running validation...
Generating 4 images with prompt: A photo of sks girl posing in a photo studio.
[rank0]: Traceback (most recent call last):
[rank0]: File "examples/dreambooth/train_dreambooth_flux.py", line 1791, in <module>
[rank0]: main(args)
[rank0]: File "examples/dreambooth/train_dreambooth_flux.py", line 1715, in main
[rank0]: images = log_validation(
[rank0]: File "examples/dreambooth/train_dreambooth_flux.py", line 173, in log_validation
[rank0]: images = [pipeline(**pipeline_args, generator=generator).images[0] for _ in range(args.num_validation_images)]
[rank0]: File "examples/dreambooth/train_dreambooth_flux.py", line 173, in <listcomp>
[rank0]: images = [pipeline(**pipeline_args, generator=generator).images[0] for _ in range(args.num_validation_images)]
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 719, in __call__
[rank0]: noise_pred = self.transformer(
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/accelerate/utils/operations.py", line 820, in forward
[rank0]: return model_forward(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/accelerate/utils/operations.py", line 808, in __call__
[rank0]: return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
[rank0]: return func(*args, **kwargs)
[rank0]: File "/diffusers/src/diffusers/models/transformers/transformer_flux.py", line 442, in forward
[rank0]: hidden_states = self.x_embedder(hidden_states)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 117, in forward
[rank0]: return F.linear(input, self.weight, self.bias)
[rank0]: RuntimeError: mat2 must be a matrix, got 1-D tensor
Steps: 0%| | 2/1000 [02:57<24:35:32, 88.71s/it, loss=nan, lr=1]
[rank0]:[W923 13:19:13.356135687 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0923 13:19:21.013317 140632834377536 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 13534 closing signal SIGTERM
E0923 13:19:28.343540 140632834377536 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 13533) of binary: /usr/bin/python
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1161, in launch_command
multi_gpu_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 799, in multi_gpu_launcher
distrib_run.run(args)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 133, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
examples/dreambooth/train_dreambooth_flux.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-09-23_13:19:21
host : x2-h100.internal.cloudapp.net
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 13533)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Describe the bug
I run the training but get this error
Reproduction
Run
accelerate config
Logs
System Info
Ubuntu 20.04 x2 NVIDIA H100 CUDA 12.2 torch==2.4.1 torchvision==0.19.1 Diffusers commit: https://github.com/huggingface/diffusers/commit/ba5af5aebbac0cc18168076a18836f175753d1c7
Who can help?
No response