Closed lonngxiang closed 1 month ago
It seems that you don't have enough memory to open the model. Can you run the model with diffusers?
It seems that you don't have enough memory to open the model. Can you run the model with diffusers?
yes, 4090
now this error happen
torchrun --nproc_per_node=2 examples/pixartalpha_example.py --model /ai/PixArt-XL-2-1024-MS --pipefusion_parallel_degree 2 --ulysses_degree 2 --num_inference_steps 20 --warmup_steps 0 --prompt "A small dog" --use_cfg_parallel
W0813 04:41:10.994219 139702199338816 torch/distributed/run.py:779]
W0813 04:41:10.994219 139702199338816 torch/distributed/run.py:779]
W0813 04:41:10.994219 139702199338816 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0813 04:41:10.994219 139702199338816 torch/distributed/run.py:779]
/home/anaconda3/envs/llm/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_fwd")
/home/anaconda3/envs/llm/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_bwd")
/home/anaconda3/envs/llm/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_fwd")
/home/anaconda3/envs/llm/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_bwd")
WARNING 08-13 04:41:13 [args.py:143] Distributed environment is not initialized. Initializing...
DEBUG 08-13 04:41:13 [parallel_state.py:141] world_size=-1 rank=-1 local_rank=-1 distributed_init_method=env:// backend=nccl
WARNING 08-13 04:41:14 [args.py:143] Distributed environment is not initialized. Initializing...
DEBUG 08-13 04:41:14 [parallel_state.py:141] world_size=-1 rank=-1 local_rank=-1 distributed_init_method=env:// backend=nccl
INFO 08-13 04:41:14 [config.py:90] Ring degree not set, using default value 1
INFO 08-13 04:41:14 [config.py:126] Pipeline patch number not set, using default value 2
rank0: Traceback (most recent call last):
rank0: File "/ai/xDiT/examples/pixartalpha_example.py", line 69, in
rank0: File "/ai/xDiT/examples/pixartalpha_example.py", line 19, in main
rank0: engine_config, input_config = engine_args.create_config()
rank0: File "/home/anaconda3/envs/llm/lib/python3.10/site-packages/xfuser/config/args.py", line 160, in create_config
rank0: parallel_config = ParallelConfig(
rank0: File "
Failures:
You must make sure cfg x pipefusion x ulysses x ring = gpu_num.
The following cmd is valid:
torchrun --nproc_per_node=2 examples/pixartalpha_example.py --model /ai/PixArt-XL-2-1024-MS --pipefusion_parallel_degree 1 --ulysses_degree 1 --num_inference_steps 20 --warmup_steps 0 --prompt "A small dog" --use_cfg_parallel
torchrun --nproc_per_node=2 examples/pixartalpha_example.py --model /ai/PixArt-XL-2-1024-MS --pipefusion_parallel_degree 2 --ulysses_degree 1 --num_inference_steps 20 --warmup_steps 0 --prompt "A small dog"
torchrun --nproc_per_node=2 examples/pixartalpha_example.py --model /ai/PixArt-XL-2-1024-MS --pipefusion_parallel_degree 1 --ulysses_degree 2 --num_inference_steps 20 --warmup_steps 0 --prompt "A small dog"
You must make sure cfg x pipefusion x ulysses x ring = gpu_num.
The following cmd is valid:
torchrun --nproc_per_node=2 examples/pixartalpha_example.py --model /ai/PixArt-XL-2-1024-MS --pipefusion_parallel_degree 1 --ulysses_degree 1 --num_inference_steps 20 --warmup_steps 0 --prompt "A small dog" --use_cfg_parallel
torchrun --nproc_per_node=2 examples/pixartalpha_example.py --model /ai/PixArt-XL-2-1024-MS --pipefusion_parallel_degree 2 --ulysses_degree 1 --num_inference_steps 20 --warmup_steps 0 --prompt "A small dog"
torchrun --nproc_per_node=2 examples/pixartalpha_example.py --model /ai/PixArt-XL-2-1024-MS --pipefusion_parallel_degree 1 --ulysses_degree 2 --num_inference_steps 20 --warmup_steps 0 --prompt "A small dog"
tks,it works; but what is cfg and ring?
--use_cfg_parallel
and --ring_degree
Ring degree defaults to 1. If --use_cfg_parallel is set, cfg is 2, otherwise 1
You can refer to https://github.com/xdit-project/xDiT/blob/main/docs/methods/cfg_parallel.md & https://github.com/xdit-project/xDiT/blob/main/docs/methods/usp.md. ring_degree
represents ring attention degree.
is gpu not enough ?
File "/home/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/modeling_utils.py", line 549, in load_state_dict [rank6]: with safe_open(checkpoint_file, framework="pt") as f: [rank6]: RuntimeError: unable to mmap 9989150328 bytes from file </ai/PixArt-XL-2-1024-MS/text_encoder/model-00001-of-00002.safetensors>: Cannot allocate memory (12)