Closed ct1976 closed 1 year ago
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
Title: 22919MiB*4 In the case of computing resources, torchrun --standalone --nproc_per_node 4 benchmark_gpt_dummy.py --model m --strategy ddp --experience_batch_size 1 --train_batch_size 1 strategy OOM.
Thanks for your feedback, you have used DDP strategy which is too naive and costs much more GPU mem. You can try
torchrun --standalone --nproc_per_node 4 benchmark_gpt_dummy.py --model m --strategy colossalai_zero2 --experience_batch_size 1 --train_batch_size 1
or
torchrun --standalone --nproc_per_node 4 benchmark_gpt_dummy.py --model m --strategy colossalai_gemini --experience_batch_size 1 --train_batch_size 1
And you can find amazing improvement in GPU mem costing.
🐛 Describe the bug
相关日志: WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Actor: 338.39 M Critic: 338.39 M Initial model: 338.39 M Reward model: 338.39 M
Train epoch [1/3]: 50%|████████████████████████▌ | 1/2 [00:01<00:01, 1.68s/it, actor_loss=0.0324, critic_loss=0.00114] Episode [1/3]: 88%|█████████████████████████████████████████████████████████████████████████████████▍ | 7/8 [01:18<00:11, 11.27s/it] Traceback (most recent call last): File "/dev/ml/ColossalAI/applications/ChatGPT/benchmarks/benchmark_gpt_dummy.py", line 180, in
main(args)
File "/dev/ml/ColossalAI/applications/ChatGPT/benchmarks/benchmark_gpt_dummy.py", line 156, in main
trainer.fit(random_prompts,
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/trainer/base.py", line 118, in fit
self._learn()
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/trainer/base.py", line 94, in _learn
metrics = self.training_step(experience)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/trainer/ppo.py", line 86, in training_step
action_log_probs = self.actor(experience.sequences, num_actions, attention_mask=experience.attention_mask)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/nn/actor.py", line 59, in forward
output = self.model(sequences, attention_mask=attention_mask)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, *kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(inputs[0], kwargs[0])
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1065, in forward
lm_logits = self.lm_head(hidden_states)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, *kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 22.38 GiB total capacity; 21.17 GiB already allocated; 23.94 MiB free; 21.43 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "/dev/ml/ColossalAI/applications/ChatGPT/benchmarks/benchmark_gpt_dummy.py", line 180, in
main(args)
File "/dev/ml/ColossalAI/applications/ChatGPT/benchmarks/benchmark_gpt_dummy.py", line 156, in main
trainer.fit(random_prompts,
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/trainer/base.py", line 118, in fit
self._learn()
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/trainer/base.py", line 94, in _learn
metrics = self.training_step(experience)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/trainer/ppo.py", line 86, in training_step
action_log_probs = self.actor(experience.sequences, num_actions, attention_mask=experience.attention_mask)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call( input, kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/nn/actor.py", line 59, in forward
output = self.model(sequences, attention_mask=attention_mask)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(*inputs[0], *kwargs[0])
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(input, kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1065, in forward
lm_logits = self.lm_head(hidden_states)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 3; 22.38 GiB total capacity; 21.17 GiB already allocated; 39.94 MiB free; 21.43 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "/dev/ml/ColossalAI/applications/ChatGPT/benchmarks/benchmark_gpt_dummy.py", line 180, in
main(args)
File "/dev/ml/ColossalAI/applications/ChatGPT/benchmarks/benchmark_gpt_dummy.py", line 156, in main
trainer.fit(random_prompts,
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/trainer/base.py", line 118, in fit
self._learn()
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/trainer/base.py", line 94, in _learn
metrics = self.training_step(experience)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/trainer/ppo.py", line 86, in training_step
action_log_probs = self.actor(experience.sequences, num_actions, attention_mask=experience.attention_mask)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, *kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/nn/actor.py", line 59, in forward
output = self.model(sequences, attention_mask=attention_mask)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(input, kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(*inputs[0], kwargs[0])
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, *kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1065, in forward
lm_logits = self.lm_head(hidden_states)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(input, kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 2; 22.38 GiB total capacity; 21.17 GiB already allocated; 27.94 MiB free; 21.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "/dev/ml/ColossalAI/applications/ChatGPT/benchmarks/benchmark_gpt_dummy.py", line 180, in
main(args)
File "/dev/ml/ColossalAI/applications/ChatGPT/benchmarks/benchmark_gpt_dummy.py", line 156, in main
trainer.fit(random_prompts,
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/trainer/base.py", line 118, in fit
self._learn()
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/trainer/base.py", line 94, in _learn
metrics = self.training_step(experience)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/trainer/ppo.py", line 86, in training_step
action_log_probs = self.actor(experience.sequences, num_actions, attention_mask=experience.attention_mask)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/chatgpt/nn/actor.py", line 59, in forward
output = self.model(sequences, attention_mask=attention_mask)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, *kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(inputs[0], kwargs[0])
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1065, in forward
lm_logits = self.lm_head(hidden_states)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, *kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 1; 22.38 GiB total capacity; 21.17 GiB already allocated; 35.94 MiB free; 21.41 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 12129) of binary: /dev/ml/anaconda3/envs/py39/bin/python3.9
Traceback (most recent call last):
File "/dev/ml/anaconda3/envs/py39/bin/torchrun", line 8, in
sys.exit(main())
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f( args, kwargs)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/distributed/run.py", line 719, in main
run(args)
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/dev/ml/anaconda3/envs/py39/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
benchmark_gpt_dummy.py FAILED
Environment
Thu Feb 16 15:52:55 2023
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla P40 Off | 00000000:5A:00.0 Off | 0 | | N/A 30C P8 9W / 250W | 2MiB / 22919MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla P40 Off | 00000000:5E:00.0 Off | 0 | | N/A 25C P8 9W / 250W | 2MiB / 22919MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla P40 Off | 00000000:62:00.0 Off | 0 | | N/A 27C P8 10W / 250W | 2MiB / 22919MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla P40 Off | 00000000:66:00.0 Off | 0 | | N/A 27C P8 10W / 250W | 2MiB / 22919MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+