[2023-11-23 17:00:35,408] [INFO] [utils.py:785:see_memory_usage] After Building Model
[2023-11-23 17:00:35,409] [INFO] [utils.py:786:see_memory_usage] MA 0.0 GB Max_MA 0.46 GB CA 0.76 GB Max_CA 1 GB
[2023-11-23 17:00:35,409] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 140.73 GB, percent = 14.0%
> number of parameters on (tensor, pipeline) model parallel rank (0, 0): 6972248064
> learning rate decay style: cosine
DeepSpeed is enabled.
[2023-11-23 17:00:35,412] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.10.0, git-hash=unknown, git-branch=unknown
[2023-11-23 17:00:35,421] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: True
[2023-11-23 17:00:35,422] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-11-23 17:00:35,422] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
Traceback (most recent call last):
File "/Megatron-DeepSpeed-master-A100/pretrain_gpt.py", line 338, in <module>
pretrain(train_valid_test_datasets_provider,
File "/Megatron-DeepSpeed-master-A100/megatron/training.py", line 135, in pretrain
model, optimizer, opt_param_scheduler = setup_model_and_optimizer(
File "/Megatron-DeepSpeed-master-A100/megatron/training.py", line 579, in setup_model_and_optimizer
model, optimizer, _, opt_param_scheduler = deepspeed.initialize(
File "/opt/conda/lib/python3.10/site-packages/deepspeed/__init__.py", line 171, in initialize
engine = DeepSpeedEngine(args=args,
File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 310, in __init__
self._configure_optimizer(optimizer, model_parameters)
File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1196, in _configure_optimizer
raise ZeRORuntimeException(msg)
deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a client provided optimizer (<class 'apex.optimizers.fused_adam.FusedAdam'>) which in most cases will yield poor performance. Please either use deepspeed.ops.adam.DeepSpeedCPUAdam or set an optimizer in your ds-config (https://www.deepspeed.ai/docs/config-json/#optimizer-parameters). If you really want to use a custom optimizer w. ZeRO-Offload and understand the performance impacts you can also set <"zero_force_ds_cpu_optimizer": false> in your configuration file.
[2023-11-23 17:00:39,667] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 255) of binary: /opt/conda/bin/python
Traceback (most recent call last):
File "/opt/conda/bin/torchrun", line 33, in <module>
sys.exit(load_entry_point('torch==2.2.0.dev20230912+cu118', 'console_scripts', 'torchrun')())
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
Train with bf16 and zero stage 3 cause this error, the script:
And the log info: