[RESOLVED] An error occurs when running in the deepspeed cpu inference

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

https://www.deepspeed.ai/

Apache License 2.0

34.68k stars 4.05k forks source link

[RESOLVED] An error occurs when running in the deepspeed cpu inference #4180

Closed park12sj closed 12 months ago

park12sj commented 1 year ago

Describe the bug I am using 0.9.4 version after reflecting the two PRs below, but cpu inference is not working CUDA optional deepspeed ops https://github.com/microsoft/DeepSpeed/pull/2507 Enable page-locked tensors without CUDA https://github.com/microsoft/DeepSpeed/pull/2775

To Reproduce

code

torch_dtype = torch.bfloat16
ds_engine = deepspeed.init_inference(
  model,
  tensor_parallel={
      "tp_size": world_size,
  },
  dtype=torch_dtype,
  replace_with_kernel_inject=True,
  save_mp_checkpoint_path=args.save_mp_checkpoint_path,
  # injection_policy=injection_policy,
)
model = ds_engine.module

Describe the bug

error

/opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py:545 in    │
│ compute_capability_args                                                                          │
│                                                                                                  │
│   542 │   │   │   │   if cc not in ccs:                                                          │
│   543 │   │   │   │   │   ccs.append(cc)                                                         │
│   544 │   │   │   ccs = sorted(ccs)                                                              │
│ ❱ 545 │   │   │   ccs[-1] += '+PTX'                                                              │
│   546 │   │   else:                                                                              │
│   547 │   │   │   # Cross-compile mode, compile for various architectures                        │
│   548 │   │   │   # env override takes priority

In the code below, if the cuda is not available, an error always appears https://github.com/microsoft/DeepSpeed/blob/7711bdbbd27c62ab4986f35c1ed01a0268fed92f/op_builder/builder.py#L528-L537

Expected behavior If there is a cpu inference guide such as an option that does not go through the code, Please provide it to me. I may not be able to find it well, but it is difficult to find information about cpu inference in the guide.

System info (please complete the following information):

OS: Ubuntu 22.04
GPU count and types : cpu only
Python version : 3.10
deepspeed version : 0.9.4

delock commented 1 year ago

From the error posted DeepSpeed seems going through CUDA path instead of CPU path. Can you check the log to see if there is any line like the following?

Setting ds_accelerator to cpu (auto detect)

There is a section in DeepSpeed tutorial showing how to run DeepSpeed model on CPU. https://www.deepspeed.ai/tutorials/accelerator-abstraction-interface/#run-deepspeed-model-on-cpu

park12sj commented 1 year ago

@delock Hello, I get 404 error when I install below to install oneccl_bindings_for_pytorch. Do you happen to know how to deal with it? python -m pip install oneccl_bind_pt==2.0 -f https://developer.intel.com/ipex-whl-stable-cpu

WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /Redirector/404Redirector.aspx?404;https://developer.intel.com/ipex-whl-stable-cpu

park12sj commented 1 year ago

https://developer.intel.com/ipex-whl-stable-cpu I downloaded the whl file directly from that link and solved it.

park12sj commented 1 year ago

@delock Hello, thanks to the guide you gave me, I installed the necessary dependency and checked the execution.

[2023-08-22 13:32:35,047] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cpu (auto detect)
[2023-08-22 13:32:36,171] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl
[2023-08-22 13:32:36,171] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.13.4-1
[2023-08-22 13:32:36,171] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-devel-2.13.4-1+cuda11.7
[2023-08-22 13:32:36,171] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.13.4-1
[2023-08-22 13:32:36,171] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.13.4
[2023-08-22 13:32:36,171] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE=libnccl-2.13.4-1+cuda11.7
[2023-08-22 13:32:36,171] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_VERSION=2.13.4
[2023-08-22 13:32:36,171] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-devel
[2023-08-22 13:32:36,171] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2023-08-22 13:32:36,171] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2023-08-22 13:32:36,171] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2023-08-22 13:32:36,171] [INFO] [launch.py:163:main] dist_world_size=2
[2023-08-22 13:32:36,171] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
My guessed rank = 1
My guessed rank = 0
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
[2023-08-22 13:32:40,099] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cpu (auto detect)
[2023-08-22 13:32:40,102] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cpu (auto detect)

But the subprocess is killed before all the checkpoints are loaded. Debugging is difficult because there is no error log left. Is there any debugging tips?

Loading checkpoint shards:  93%|█████████████████████████████████████████████████████████████▎    | 26/28 [01:23<00:12,  6.14s/it][2023-08-22 13:26:36,865] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 113230
[2023-08-22 13:26:38,413] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 113233
[2023-08-22 13:26:38,413] [ERROR] [launch.py:321:sigkill_handler] ['numactl', '-m', '1', '-C', '20-39', '/opt/conda/envs/py310/bin/python', '-u', 'save_ds_sharded.py', '--local_rank=1', '--base_model_dir', '/workspace/storage/cephfs-personal/hf/polyglot-ko-12.8b', '--custom_model_dir', '/workspace/storage/cephfs-personal/custom/341ae74c440c4bb4b28e3da5e207f635/merged', '--save_mp_checkpoint_path', '/workspace/storage/cephfs-personal/git/pai/ml/ml/model/application/nlp/place_lm/inference/acceleration/01_deepspeed/ds_sharded_tp1_bfloat16', '--torch_dtype', 'bfloat16'] exits with return code = -9
make[1]: *** [save_ds_sharded] Error 247
make[1]: Leaving directory `/workspace/storage/cephfs-personal/git/pai/ml/ml/model/application/nlp/place_lm/inference'
make: *** [save_ds_sharded_cpu] Error 2

delock commented 1 year ago

@park12sj had no clue from the log file. Did you checked system remaining memory while the model is loading?

park12sj commented 1 year ago

@delock

Hello, I solved the original problem, but there is an error as below

RuntimeError: Error building extension 'deepspeed_ccl_comm'
[2023-09-06 11:06:44,373] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 141785
[2023-09-06 11:06:44,373] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 141786

https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops According to the guide, even if I install it with the command below, it doesn't install deepspeed_ccl_comm, is there any other way? DS_BUILD_CCL_COMM=1 pip install deepspeed==0.9.4

ds_report

--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented  [NO] ....... [OKAY]
deepspeed_ccl_comm ..... [NO] ....... [OKAY]
--------------------------------------------------

park12sj commented 1 year ago

Here's the second other question.

The cpu accelerator is set, and when we run deepspeed.init_inference, we get an error that the cpu backend does not have implemented. ValueError: This op had not been implemented on CPU backend.

Below is the full text of the error. I don't think init_inference is not supported, should I set a different setting?

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:24<00:00,  1.16it/s]
[2023-09-06 12:40:33,166] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.10.2, git-hash=unknown, git-branch=unknown
[2023-09-06 12:40:33,167] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /workspace/storage/cephfs-personal/git/pai/ml/ml/model/application/nlp/place_lm/inference/accele │
│ ration/01_deepspeed/save_ds_sharded.py:62 in <module>                                            │
│                                                                                                  │
│   59 remove_hook_from_module(model, recurse=True)                                                │
│   60 model.eval()                                                                                │
│   61                                                                                             │
│ ❱ 62 ds_engine = deepspeed.init_inference(                                                       │
│   63 │   model,                                                                                  │
│   64 │   tensor_parallel={                                                                       │
│   65 │   │   "tp_size": world_size,                                                              │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/__init__.py:342 in init_inference   │
│                                                                                                  │
│   339 │                                                                                          │
│   340 │   ds_inference_config = DeepSpeedInferenceConfig(**config_dict)                          │
│   341 │                                                                                          │
│ ❱ 342 │   engine = InferenceEngine(model, config=ds_inference_config)                            │
│   343 │                                                                                          │
│   344 │   return engine                                                                          │
│   345                                                                                            │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/inference/engine.py:141 in __init__ │
│                                                                                                  │
│   138 │   │   else:                                                                              │
│   139 │   │   │   if config.replace_with_kernel_inject:                                          │
│   140 │   │   │   │   # 2. DeepSpeed Kernel Injection                                            │
│ ❱ 141 │   │   │   │   self._apply_injection_policy(config)                                       │
│   142 │   │   │   elif config.tensor_parallel.tp_size > 1:                                       │
│   143 │   │   │   │   # 3. Automatic Tensor Parallelism                                          │
│   144 │   │   │   │   parser_dict = AutoTP.tp_parser(model)                                      │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/inference/engine.py:381 in          │
│ _apply_injection_policy                                                                          │
│                                                                                                  │
│   378 │   │                                                                                      │
│   379 │   │   if isinstance(self.module, torch.nn.Module):                                       │
│   380 │   │   │   # config is our DeepSpeedInferenceConfig and self.config is the HF model con   │
│ ❱ 381 │   │   │   replace_transformer_layer(client_module, self.module, checkpoint, config, se   │
│   382 │                                                                                          │
│   383 │   def _get_all_ckpt_names(self, checkpoints_path, tag):                                  │
│   384 │   │   ckpt_file_pattern = self._get_ckpt_name(checkpoints_path, tag, mp_placeholder="*   │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:312 │
│ in replace_transformer_layer                                                                     │
│                                                                                                  │
│   309 │   │   │   pbar.update(1)                                                                 │
│   310 │   │   │   gc.collect()                                                                   │
│   311 │   else:                                                                                  │
│ ❱ 312 │   │   replaced_module = replace_module(model=model,                                      │
│   313 │   │   │   │   │   │   │   │   │   │    orig_class=orig_layer_impl,                       │
│   314 │   │   │   │   │   │   │   │   │   │    replace_fn=replace_fn,                            │
│   315 │   │   │   │   │   │   │   │   │   │    _replace_policy=config.injection_policy_tuple)    │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:555 │
│ in replace_module                                                                                │
│                                                                                                  │
│   552 │   │   "No default policy found! Please specify your policy injection_policy (like {Ber   │
│   553 │   │   "You can find some samples here: https://github.com/microsoft/DeepSpeed/blob/mas   │
│   554 │                                                                                          │
│ ❱ 555 │   replaced_module, _ = _replace_module(model, policy, state_dict=sd)                     │
│   556 │   if checkpoint is not None:                                                             │
│   557 │   │   embedding_weight = None                                                            │
│   558 │   │   for n, p in replaced_module.named_parameters():                                    │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:623 │
│ in _replace_module                                                                               │
│                                                                                                  │
│   620 │   │   │   │   │   continue                                                               │
│   621 │   │   │   if len(child._buffers) != 0 and state_dict is not None:                        │
│   622 │   │   │   │   Loading.load_buffer(child, state_dict, checking_key)                       │
│ ❱ 623 │   │   │   _, layer_id = _replace_module(child,                                           │
│   624 │   │   │   │   │   │   │   │   │   │     policies,                                        │
│   625 │   │   │   │   │   │   │   │   │   │     prefix if level_id == 0 and skip_level_0_prefi   │
│   626 │   │   │   │   │   │   │   │   │   │     prefix + name + '.',                             │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:623 │
│ in _replace_module                                                                               │
│                                                                                                  │
│   620 │   │   │   │   │   continue                                                               │
│   621 │   │   │   if len(child._buffers) != 0 and state_dict is not None:                        │
│   622 │   │   │   │   Loading.load_buffer(child, state_dict, checking_key)                       │
│ ❱ 623 │   │   │   _, layer_id = _replace_module(child,                                           │
│   624 │   │   │   │   │   │   │   │   │   │     policies,                                        │
│   625 │   │   │   │   │   │   │   │   │   │     prefix if level_id == 0 and skip_level_0_prefi   │
│   626 │   │   │   │   │   │   │   │   │   │     prefix + name + '.',                             │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:599 │
│ in _replace_module                                                                               │
│                                                                                                  │
│   596 │   """                                                                                    │
│   597 │   for name, child in model.named_children():                                             │
│   598 │   │   if child.__class__ in policies:                                                    │
│ ❱ 599 │   │   │   replaced_module = policies[child.__class__][0](child,                          │
│   600 │   │   │   │   │   │   │   │   │   │   │   │   │   │      policies[child.__class__][-1]   │
│   601 │   │   │   │   │   │   │   │   │   │   │   │   │   │      layer_id,                       │
│   602 │   │   │   │   │   │   │   │   │   │   │   │   │   │      prefix=prefix + name,           │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:289 │
│ in replace_fn                                                                                    │
│                                                                                                  │
│   286 │   │   else:                                                                              │
│   287 │   │   │   # copy relevant state from child -> new module                                 │
│   288 │   │   │   if config.replace_with_kernel_inject:                                          │
│ ❱ 289 │   │   │   │   new_module = replace_with_policy(child,                                    │
│   290 │   │   │   │   │   │   │   │   │   │   │   │    _policy,                                  │
│   291 │   │   │   │   │   │   │   │   │   │   │   │    config.triangular_masking,                │
│   292 │   │   │   │   │   │   │   │   │   │   │   │    inference=True,                           │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:246 │
│ in replace_with_policy                                                                           │
│                                                                                                  │
│   243 │   │   _container.create_ds_model_config()                                                │
│   244 │   │                                                                                      │
│   245 │   │   # 7. use the config and create the module                                          │
│ ❱ 246 │   │   _container.create_module()                                                         │
│   247 │   │                                                                                      │
│   248 │   │   # 8. transpose the weights and bias if needed                                      │
│   249 │   │   _container.transpose()                                                             │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/containers/gptneox.py │
│ :28 in create_module                                                                             │
│                                                                                                  │
│    25 │                                                                                          │
│    26 │   def create_module(self, config=None):                                                  │
│    27 │   │   _config = config if config is not None else self.ds_model_config                   │
│ ❱  28 │   │   self.module = DeepSpeedGPTInference(_config, mp_group=self.mp_group)               │
│    29 │   │   self.module.config.scale_attention = self.scale_attention                          │
│    30 │   │                                                                                      │
│    31 │   │   if self.megatron_v2:                                                               │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/model_implementations/transformers/ │
│ ds_gpt.py:20 in __init__                                                                         │
│                                                                                                  │
│   17 │   │   │   │    quantize_groups=1,                                                         │
│   18 │   │   │   │    merge_count=1,                                                             │
│   19 │   │   │   │    mlp_extra_grouping=False):                                                 │
│ ❱ 20 │   │   super().__init__(config, mp_group, quantize_scales, quantize_groups, merge_count    │
│   21                                                                                             │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/model_implementations/transformers/ │
│ ds_transformer.py:58 in __init__                                                                 │
│                                                                                                  │
│    55 │   │   global inference_module                                                            │
│    56 │   │   if inference_module is None:                                                       │
│    57 │   │   │   builder = InferenceBuilder()                                                   │
│ ❱  58 │   │   │   inference_module = builder.load()                                              │
│    59 │   │                                                                                      │
│    60 │   │   if DeepSpeedTransformerInference.layer_id == 1:                                    │
│    61 │   │   │   log_dist(f"DeepSpeed-Inference config: {self.config.__dict__}", [0])           │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/op_builder/cpu/no_impl.py:21 in │
│ load                                                                                             │
│                                                                                                  │
│   18 │   │   return f'deepspeed.ops.comm.{self.NAME}_op'                                         │
│   19 │                                                                                           │
│   20 │   def load(self, verbose=True):                                                           │
│ ❱ 21 │   │   raise ValueError("This op had not been implemented on CPU backend.")                │
│   22 │                                                                                           │
│   23 │   def sources(self):                                                                      │
│   24 │   │   return []                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: This op had not been implemented on CPU backend.

delock commented 1 year ago

@delock

Hello, I solved the original problem, but there is an error as below
RuntimeError: Error building extension 'deepspeed_ccl_comm'
[2023-09-06 11:06:44,373] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 141785
[2023-09-06 11:06:44,373] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 141786
https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops According to the guide, even if I install it with the command below, it doesn't install deepspeed_ccl_comm, is there any other way? DS_BUILD_CCL_COMM=1 pip install deepspeed==0.9.4

ds_report
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented  [NO] ....... [OKAY]
deepspeed_ccl_comm ..... [NO] ....... [OKAY]
--------------------------------------------------

Hi @park12sj , do you have more log before building deepspeed_ccl_comm kernel. Especially the line shows the command that build the kernel?

Besides, do you experience build error in pre-install mode only, or in JIT mode as well?

delock commented 1 year ago

Hi @park12sj , from the trace, it looks like you try to inference the model with kernel injection. On CPU accelerator we support AutoTP but not kernel injection. If you call init_inference by set replace_with_kernel_inject to False, you will be using AutoTP instead of kernel injection.

Here's the second other question.

The cpu accelerator is set, and when we run deepspeed.init_inference, we get an error that the cpu backend does not have implemented. ValueError: This op had not been implemented on CPU backend.

Below is the full text of the error. I don't think init_inference is not supported, should I set a different setting?

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:24<00:00,  1.16it/s]
[2023-09-06 12:40:33,166] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.10.2, git-hash=unknown, git-branch=unknown
[2023-09-06 12:40:33,167] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /workspace/storage/cephfs-personal/git/pai/ml/ml/model/application/nlp/place_lm/inference/accele │
│ ration/01_deepspeed/save_ds_sharded.py:62 in <module>                                            │
│                                                                                                  │
│   59 remove_hook_from_module(model, recurse=True)                                                │
│   60 model.eval()                                                                                │
│   61                                                                                             │
│ ❱ 62 ds_engine = deepspeed.init_inference(                                                       │
│   63 │   model,                                                                                  │
│   64 │   tensor_parallel={                                                                       │
│   65 │   │   "tp_size": world_size,                                                              │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/__init__.py:342 in init_inference   │
│                                                                                                  │
│   339 │                                                                                          │
│   340 │   ds_inference_config = DeepSpeedInferenceConfig(**config_dict)                          │
│   341 │                                                                                          │
│ ❱ 342 │   engine = InferenceEngine(model, config=ds_inference_config)                            │
│   343 │                                                                                          │
│   344 │   return engine                                                                          │
│   345                                                                                            │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/inference/engine.py:141 in __init__ │
│                                                                                                  │
│   138 │   │   else:                                                                              │
│   139 │   │   │   if config.replace_with_kernel_inject:                                          │
│   140 │   │   │   │   # 2. DeepSpeed Kernel Injection                                            │
│ ❱ 141 │   │   │   │   self._apply_injection_policy(config)                                       │
│   142 │   │   │   elif config.tensor_parallel.tp_size > 1:                                       │
│   143 │   │   │   │   # 3. Automatic Tensor Parallelism                                          │
│   144 │   │   │   │   parser_dict = AutoTP.tp_parser(model)                                      │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/inference/engine.py:381 in          │
│ _apply_injection_policy                                                                          │
│                                                                                                  │
│   378 │   │                                                                                      │
│   379 │   │   if isinstance(self.module, torch.nn.Module):                                       │
│   380 │   │   │   # config is our DeepSpeedInferenceConfig and self.config is the HF model con   │
│ ❱ 381 │   │   │   replace_transformer_layer(client_module, self.module, checkpoint, config, se   │
│   382 │                                                                                          │
│   383 │   def _get_all_ckpt_names(self, checkpoints_path, tag):                                  │
│   384 │   │   ckpt_file_pattern = self._get_ckpt_name(checkpoints_path, tag, mp_placeholder="*   │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:312 │
│ in replace_transformer_layer                                                                     │
│                                                                                                  │
│   309 │   │   │   pbar.update(1)                                                                 │
│   310 │   │   │   gc.collect()                                                                   │
│   311 │   else:                                                                                  │
│ ❱ 312 │   │   replaced_module = replace_module(model=model,                                      │
│   313 │   │   │   │   │   │   │   │   │   │    orig_class=orig_layer_impl,                       │
│   314 │   │   │   │   │   │   │   │   │   │    replace_fn=replace_fn,                            │
│   315 │   │   │   │   │   │   │   │   │   │    _replace_policy=config.injection_policy_tuple)    │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:555 │
│ in replace_module                                                                                │
│                                                                                                  │
│   552 │   │   "No default policy found! Please specify your policy injection_policy (like {Ber   │
│   553 │   │   "You can find some samples here: https://github.com/microsoft/DeepSpeed/blob/mas   │
│   554 │                                                                                          │
│ ❱ 555 │   replaced_module, _ = _replace_module(model, policy, state_dict=sd)                     │
│   556 │   if checkpoint is not None:                                                             │
│   557 │   │   embedding_weight = None                                                            │
│   558 │   │   for n, p in replaced_module.named_parameters():                                    │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:623 │
│ in _replace_module                                                                               │
│                                                                                                  │
│   620 │   │   │   │   │   continue                                                               │
│   621 │   │   │   if len(child._buffers) != 0 and state_dict is not None:                        │
│   622 │   │   │   │   Loading.load_buffer(child, state_dict, checking_key)                       │
│ ❱ 623 │   │   │   _, layer_id = _replace_module(child,                                           │
│   624 │   │   │   │   │   │   │   │   │   │     policies,                                        │
│   625 │   │   │   │   │   │   │   │   │   │     prefix if level_id == 0 and skip_level_0_prefi   │
│   626 │   │   │   │   │   │   │   │   │   │     prefix + name + '.',                             │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:623 │
│ in _replace_module                                                                               │
│                                                                                                  │
│   620 │   │   │   │   │   continue                                                               │
│   621 │   │   │   if len(child._buffers) != 0 and state_dict is not None:                        │
│   622 │   │   │   │   Loading.load_buffer(child, state_dict, checking_key)                       │
│ ❱ 623 │   │   │   _, layer_id = _replace_module(child,                                           │
│   624 │   │   │   │   │   │   │   │   │   │     policies,                                        │
│   625 │   │   │   │   │   │   │   │   │   │     prefix if level_id == 0 and skip_level_0_prefi   │
│   626 │   │   │   │   │   │   │   │   │   │     prefix + name + '.',                             │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:599 │
│ in _replace_module                                                                               │
│                                                                                                  │
│   596 │   """                                                                                    │
│   597 │   for name, child in model.named_children():                                             │
│   598 │   │   if child.__class__ in policies:                                                    │
│ ❱ 599 │   │   │   replaced_module = policies[child.__class__][0](child,                          │
│   600 │   │   │   │   │   │   │   │   │   │   │   │   │   │      policies[child.__class__][-1]   │
│   601 │   │   │   │   │   │   │   │   │   │   │   │   │   │      layer_id,                       │
│   602 │   │   │   │   │   │   │   │   │   │   │   │   │   │      prefix=prefix + name,           │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:289 │
│ in replace_fn                                                                                    │
│                                                                                                  │
│   286 │   │   else:                                                                              │
│   287 │   │   │   # copy relevant state from child -> new module                                 │
│   288 │   │   │   if config.replace_with_kernel_inject:                                          │
│ ❱ 289 │   │   │   │   new_module = replace_with_policy(child,                                    │
│   290 │   │   │   │   │   │   │   │   │   │   │   │    _policy,                                  │
│   291 │   │   │   │   │   │   │   │   │   │   │   │    config.triangular_masking,                │
│   292 │   │   │   │   │   │   │   │   │   │   │   │    inference=True,                           │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:246 │
│ in replace_with_policy                                                                           │
│                                                                                                  │
│   243 │   │   _container.create_ds_model_config()                                                │
│   244 │   │                                                                                      │
│   245 │   │   # 7. use the config and create the module                                          │
│ ❱ 246 │   │   _container.create_module()                                                         │
│   247 │   │                                                                                      │
│   248 │   │   # 8. transpose the weights and bias if needed                                      │
│   249 │   │   _container.transpose()                                                             │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/containers/gptneox.py │
│ :28 in create_module                                                                             │
│                                                                                                  │
│    25 │                                                                                          │
│    26 │   def create_module(self, config=None):                                                  │
│    27 │   │   _config = config if config is not None else self.ds_model_config                   │
│ ❱  28 │   │   self.module = DeepSpeedGPTInference(_config, mp_group=self.mp_group)               │
│    29 │   │   self.module.config.scale_attention = self.scale_attention                          │
│    30 │   │                                                                                      │
│    31 │   │   if self.megatron_v2:                                                               │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/model_implementations/transformers/ │
│ ds_gpt.py:20 in __init__                                                                         │
│                                                                                                  │
│   17 │   │   │   │    quantize_groups=1,                                                         │
│   18 │   │   │   │    merge_count=1,                                                             │
│   19 │   │   │   │    mlp_extra_grouping=False):                                                 │
│ ❱ 20 │   │   super().__init__(config, mp_group, quantize_scales, quantize_groups, merge_count    │
│   21                                                                                             │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/model_implementations/transformers/ │
│ ds_transformer.py:58 in __init__                                                                 │
│                                                                                                  │
│    55 │   │   global inference_module                                                            │
│    56 │   │   if inference_module is None:                                                       │
│    57 │   │   │   builder = InferenceBuilder()                                                   │
│ ❱  58 │   │   │   inference_module = builder.load()                                              │
│    59 │   │                                                                                      │
│    60 │   │   if DeepSpeedTransformerInference.layer_id == 1:                                    │
│    61 │   │   │   log_dist(f"DeepSpeed-Inference config: {self.config.__dict__}", [0])           │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/op_builder/cpu/no_impl.py:21 in │
│ load                                                                                             │
│                                                                                                  │
│   18 │   │   return f'deepspeed.ops.comm.{self.NAME}_op'                                         │
│   19 │                                                                                           │
│   20 │   def load(self, verbose=True):                                                           │
│ ❱ 21 │   │   raise ValueError("This op had not been implemented on CPU backend.")                │
│   22 │                                                                                           │
│   23 │   def sources(self):                                                                      │
│   24 │   │   return []                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: This op had not been implemented on CPU backend.

park12sj commented 1 year ago

@delock

Hello, as you said, I applied the settings below, and the injection attempt disappeared. Thank you. replace_with_kernel_inject = False

However, this error still occurs. Error building extension 'deepspeed_ccl_comm' The difference is that tring to build deepspeed_ccl_comm while running deepspeed.

Building extension module deepspeed_ccl_comm...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)

But it fails in the end.(The error log is too long, so I attached only a part of it) Is the version of a particular module different or Do I need additional settings?

Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF ccl.o.d -DTORCH_EXTENSION_NAME=deepspeed_ccl_comm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/csrc/cpu/includes -isystem /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include -isystem /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/THC -isystem /opt/conda/envs/py310/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O2 -fopenmp -c /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/csrc/cpu/comm/ccl.cpp -o ccl.o 
FAILED: ccl.o 

...

/opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/csrc/cpu/comm/ccl.cpp:328:46: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
     if (port_string == NULL) { port_string = ""; }
                                              ^~
In file included from /opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7/include/immintrin.h:41:0,
                 from /opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7/include/x86intrin.h:48,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/profiler/util.h:34,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/autograd/profiler_kineto.h:9,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/autograd/profiler.h:3,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:7,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/all.h:15,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/extension.h:4,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/csrc/cpu/comm/ccl.cpp:6:
/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7/include/avxintrin.h: In function ‘_Z19reduce_fp32_buffersiiP19allreduce_workspace._omp_fn.2’:
/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7/include/avxintrin.h:895:1: error: inlining failed in call to always_inline ‘__m256 _mm256_loadu_ps(const float*)’: target specific option mismatch
 _mm256_loadu_ps (float const *__P)
 ^~~~~~~~~~~~~~~
/opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/csrc/cpu/comm/ccl.cpp:249:75: note: called from here
         auto inout_val = _mm256_loadu_ps((float*)(workspace[0].buffer + i));
                                                                           ^
In file included from /opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7/include/immintrin.h:41:0,
                 from /opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7/include/x86intrin.h:48,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/profiler/util.h:34,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/autograd/profiler_kineto.h:9,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/autograd/profiler.h:3,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:7,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/all.h:15,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/extension.h:4,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/csrc/cpu/comm/ccl.cpp:6:
/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7/include/avxintrin.h:901:1: error: inlining failed in call to always_inline ‘void _mm256_storeu_ps(float*, __m256)’: target specific option mismatch
 _mm256_storeu_ps (float *__P, __m256 __A)
 ^~~~~~~~~~~~~~~~
/opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/csrc/cpu/comm/ccl.cpp:259:25: note: called from here
         _mm256_storeu_ps((float*)(workspace[0].buffer + i), inout_val);
         ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7/include/immintrin.h:41:0,
                 from /opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7/include/x86intrin.h:48,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/profiler/util.h:34,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/autograd/profiler_kineto.h:9,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/autograd/profiler.h:3,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:7,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/all.h:15,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/torch/include/torch/extension.h:4,
                 from /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/csrc/cpu/comm/ccl.cpp:6:
/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7/include/avxintrin.h:146:1: error: inlining failed in call to always_inline ‘__m256 _mm256_add_ps(__m256, __m256)’: target specific option mismatch
 _mm256_add_ps (__m256 __A, __m256 __B)
 ^~~~~~~~~~~~~
/opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/csrc/cpu/comm/ccl.cpp:241:19: note: called from here
         inout_val = _mm256_add_ps(inout_val, in##x##_val);                     \
         ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/csrc/cpu/comm/ccl.cpp:178:5: note: in expansion of macro ‘CVT_ADD_F32’
     x(2)
     ^
/opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/csrc/cpu/comm/ccl.cpp:174:22: note: in expansion of macro ‘REPEAT_2’
 #define REPEAT(N, x) REPEAT_##N(x)
                      ^~~~~~~

park12sj commented 1 year ago

I solved it by updating the gcc version

park12sj commented 1 year ago

@delock

Hello, everything I asked you above has been resolved. However, an error occurs in the replace_transformer_layer. Are there any additional settings that I need to apply besides replace_with_kernel_inject=False? Code that works in gpu.

AutoTP:  [(<class 'transformers.models.gpt_neox.modeling_gpt_neox.GPTNeoXLayer'>, ['attention.dense', 'mlp.dense_4h_to_h'])]
AutoTP:  [(<class 'transformers.models.gpt_neox.modeling_gpt_neox.GPTNeoXLayer'>, ['attention.dense', 'mlp.dense_4h_to_h'])]
Loading 2 checkpoint shards:   0%|                                                                                                                                                                                                                              | 0/2 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /workspace/storage/cephfs-personal/git/pai/ml/ml/model/application/nlp/place_lm/inference/accele │
│ ration/01_deepspeed/evaluate_bfloat16.py:90 in <module>                                          │
│                                                                                                  │
│    89                                                                                            │
│ ❱  90 ds_engine = deepspeed.init_inference(                                                      │
│    91 │   model,                                                                                 │
│    92 │   dtype=torch_dtype,                                                                     │
│    93 │   tensor_parallel={                                                                      │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/__init__.py:342 in init_inference   │
│                                                                                                  │
│   339 │                                                                                          │
│   340 │   ds_inference_config = DeepSpeedInferenceConfig(**config_dict)                          │
│   341 │                                                                                          │
│ ❱ 342 │   engine = InferenceEngine(model, config=ds_inference_config)                            │
│   343 │                                                                                          │
│   344 │   return engine                                                                          │
│   345                                                                                            │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/inference/engine.py:151 in __init__ │
│                                                                                                  │
│   148 │   │   │   │   │   │   config.injection_policy_tuple = (injection_policy, )               │
│   149 │   │   │   │   │   else:                                                                  │
│   150 │   │   │   │   │   │   config.injection_policy_tuple = injection_policy                   │
│ ❱ 151 │   │   │   │   │   self._apply_injection_policy(config, client_module)                    │
│   152 │   │                                                                                      │
│   153 │   │   device = get_accelerator().current_device_name()                                   │
│   154 │   │   self.module.to(device)                                                             │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/inference/engine.py:381 in          │
│ _apply_injection_policy                                                                          │
│                                                                                                  │
│   378 │   │                                                                                      │
│   379 │   │   if isinstance(self.module, torch.nn.Module):                                       │
│   380 │   │   │   # config is our DeepSpeedInferenceConfig and self.config is the HF model con   │
│ ❱ 381 │   │   │   replace_transformer_layer(client_module, self.module, checkpoint, config, se   │
│   382 │                                                                                          │
│   383 │   def _get_all_ckpt_names(self, checkpoints_path, tag):                                  │
│   384 │   │   ckpt_file_pattern = self._get_ckpt_name(checkpoints_path, tag, mp_placeholder="*   │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:308 │
│ in replace_transformer_layer                                                                     │
│                                                                                                  │
│   305 │   │   │   │   │   │   │   │   │   │   │    orig_class=orig_layer_impl,                   │
│   306 │   │   │   │   │   │   │   │   │   │   │    replace_fn=replace_fn,                        │
│   307 │   │   │   │   │   │   │   │   │   │   │    _replace_policy=config.injection_policy_tup   │
│ ❱ 308 │   │   │   │   │   │   │   │   │   │   │    checkpoint=checkpoint[i])                     │
│   309 │   │   │   pbar.update(1)                                                                 │
│   310 │   │   │   gc.collect()                                                                   │
│   311 │   else:                                                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 0
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /workspace/storage/cephfs-personal/git/pai/ml/ml/model/application/nlp/place_lm/inference/accele │
│ ration/01_deepspeed/evaluate_bfloat16.py:90 in <module>                                          │
│                                                                                                  │
│    89                                                                                            │
│ ❱  90 ds_engine = deepspeed.init_inference(                                                      │
│    91 │   model,                                                                                 │
│    92 │   dtype=torch_dtype,                                                                     │
│    93 │   tensor_parallel={                                                                      │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/__init__.py:342 in init_inference   │
│                                                                                                  │
│   339 │                                                                                          │
│   340 │   ds_inference_config = DeepSpeedInferenceConfig(**config_dict)                          │
│   341 │                                                                                          │
│ ❱ 342 │   engine = InferenceEngine(model, config=ds_inference_config)                            │
│   343 │                                                                                          │
│   344 │   return engine                                                                          │
│   345                                                                                            │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/inference/engine.py:151 in __init__ │
│                                                                                                  │
│   148 │   │   │   │   │   │   config.injection_policy_tuple = (injection_policy, )               │
│   149 │   │   │   │   │   else:                                                                  │
│   150 │   │   │   │   │   │   config.injection_policy_tuple = injection_policy                   │
│ ❱ 151 │   │   │   │   │   self._apply_injection_policy(config, client_module)                    │
│   152 │   │                                                                                      │
│   153 │   │   device = get_accelerator().current_device_name()                                   │
│   154 │   │   self.module.to(device)                                                             │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/inference/engine.py:381 in          │
│ _apply_injection_policy                                                                          │
│                                                                                                  │
│   378 │   │                                                                                      │
│   379 │   │   if isinstance(self.module, torch.nn.Module):                                       │
│   380 │   │   │   # config is our DeepSpeedInferenceConfig and self.config is the HF model con   │
│ ❱ 381 │   │   │   replace_transformer_layer(client_module, self.module, checkpoint, config, se   │
│   382 │                                                                                          │
│   383 │   def _get_all_ckpt_names(self, checkpoints_path, tag):                                  │
│   384 │   │   ckpt_file_pattern = self._get_ckpt_name(checkpoints_path, tag, mp_placeholder="*   │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:308 │
│ in replace_transformer_layer                                                                     │
│                                                                                                  │
│   305 │   │   │   │   │   │   │   │   │   │   │    orig_class=orig_layer_impl,                   │
│   306 │   │   │   │   │   │   │   │   │   │   │    replace_fn=replace_fn,                        │
│   307 │   │   │   │   │   │   │   │   │   │   │    _replace_policy=config.injection_policy_tup   │
│ ❱ 308 │   │   │   │   │   │   │   │   │   │   │    checkpoint=checkpoint[i])                     │
│   309 │   │   │   pbar.update(1)                                                                 │
│   310 │   │   │   gc.collect()                                                                   │
│   311 │   else:                                                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 0

delock commented 1 year ago

Hi @park12sj If you can run your workload on CPU, can you change the title to add [RESOLVED] mark? This will help user who looking for answer for CPU inference. Thanks!

park12sj commented 1 year ago

@delock

I want to ask you one more question.

After I save_mp_checkpoint_path as shown below,

ds_engine = deepspeed.init_inference(
    model,
    tensor_parallel={
        "tp_size": world_size,
    },
    dtype=torch_dtype,
    replace_with_kernel_inject = False if device == torch.device("cpu") else True,
    save_mp_checkpoint_path=args.save_mp_checkpoint_path,
    # injection_policy=injection_policy,
)
model = ds_engine.module

I called the config saved above as below.

ds_engine = deepspeed.init_inference(
        model,
        tensor_parallel={
            "tp_size": world_size,
        },
        config=os.path.join(
            args.save_mp_checkpoint_path, "ds_inference_config.json"
        ),
    )

A config error occurs as shown below.

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /workspace/storage/cephfs-personal/git/pai/ml/ml/model/application/nlp/place_lm/inference/accele │
│ ration/01_deepspeed/evaluate_bfloat16.py:91 in <module>                                          │
│                                                                                                  │
│    88 if device == torch.device("cpu"):                                                          │
│    89 │   model = cpu_tuning(model, tokenizer)                                                   │
│    90 │                                                                                          │
│ ❱  91 │   ds_engine = deepspeed.init_inference(                                                  │
│    92 │   │   model,                                                                             │
│    93 │   │   tensor_parallel={                                                                  │
│    94 │   │   │   "tp_size": world_size,                                                         │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/__init__.py:340 in init_inference   │
│                                                                                                  │
│   337 │   │   │   raise ValueError(f"Conflicting argument '{key}' in 'config':{config_dict[key   │
│   338 │   config_dict.update(kwargs)                                                             │
│   339 │                                                                                          │
│ ❱ 340 │   ds_inference_config = DeepSpeedInferenceConfig(**config_dict)                          │
│   341 │                                                                                          │
│   342 │   engine = InferenceEngine(model, config=ds_inference_config)                            │
│   343                                                                                            │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/runtime/config_utils.py:57 in       │
│ __init__                                                                                         │
│                                                                                                  │
│    54 │   │   if (not strict):  # This is temporary until we refactor all DS configs, allows H   │
│    55 │   │   │   data = {k: v for k, v in data.items() if (v != "auto" or k == "replace_metho   │
│    56 │   │   print("config_tuil", data)                                                         │
│ ❱  57 │   │   super().__init__(**data)                                                           │
│    58 │   │   self._deprecated_fields_check(self)                                                │
│    59 │                                                                                          │
│    60 │   def _process_deprecated_field(self, pydantic_config, field):                           │
│                                                                                                  │
│ in pydantic.main.BaseModel.__init__:341                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValidationError: 5 validation errors for DeepSpeedInferenceConfig
checkpoints
  extra fields not permitted (type=value_error.extra)
parallelization
  extra fields not permitted (type=value_error.extra)
tp_size
  extra fields not permitted (type=value_error.extra)
type
  extra fields not permitted (type=value_error.extra)
version
  extra fields not permitted (type=value_error.extra)

Is the usage wrong?

delock commented 12 months ago

I didn't digg much in passing config to inference engine. From the error message looks like some fields in the config are not allowed. Have you tried run the same script on a CUDA device?

@delock

I want to ask you one more question.

After I save_mp_checkpoint_path as shown below,

ds_engine = deepspeed.init_inference(
    model,
    tensor_parallel={
        "tp_size": world_size,
    },
    dtype=torch_dtype,
    replace_with_kernel_inject = False if device == torch.device("cpu") else True,
    save_mp_checkpoint_path=args.save_mp_checkpoint_path,
    # injection_policy=injection_policy,
)
model = ds_engine.module

I called the config saved above as below.

ds_engine = deepspeed.init_inference(
        model,
        tensor_parallel={
            "tp_size": world_size,
        },
        config=os.path.join(
            args.save_mp_checkpoint_path, "ds_inference_config.json"
        ),
    )

A config error occurs as shown below.

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /workspace/storage/cephfs-personal/git/pai/ml/ml/model/application/nlp/place_lm/inference/accele │
│ ration/01_deepspeed/evaluate_bfloat16.py:91 in <module>                                          │
│                                                                                                  │
│    88 if device == torch.device("cpu"):                                                          │
│    89 │   model = cpu_tuning(model, tokenizer)                                                   │
│    90 │                                                                                          │
│ ❱  91 │   ds_engine = deepspeed.init_inference(                                                  │
│    92 │   │   model,                                                                             │
│    93 │   │   tensor_parallel={                                                                  │
│    94 │   │   │   "tp_size": world_size,                                                         │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/__init__.py:340 in init_inference   │
│                                                                                                  │
│   337 │   │   │   raise ValueError(f"Conflicting argument '{key}' in 'config':{config_dict[key   │
│   338 │   config_dict.update(kwargs)                                                             │
│   339 │                                                                                          │
│ ❱ 340 │   ds_inference_config = DeepSpeedInferenceConfig(**config_dict)                          │
│   341 │                                                                                          │
│   342 │   engine = InferenceEngine(model, config=ds_inference_config)                            │
│   343                                                                                            │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/runtime/config_utils.py:57 in       │
│ __init__                                                                                         │
│                                                                                                  │
│    54 │   │   if (not strict):  # This is temporary until we refactor all DS configs, allows H   │
│    55 │   │   │   data = {k: v for k, v in data.items() if (v != "auto" or k == "replace_metho   │
│    56 │   │   print("config_tuil", data)                                                         │
│ ❱  57 │   │   super().__init__(**data)                                                           │
│    58 │   │   self._deprecated_fields_check(self)                                                │
│    59 │                                                                                          │
│    60 │   def _process_deprecated_field(self, pydantic_config, field):                           │
│                                                                                                  │
│ in pydantic.main.BaseModel.__init__:341                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValidationError: 5 validation errors for DeepSpeedInferenceConfig
checkpoints
  extra fields not permitted (type=value_error.extra)
parallelization
  extra fields not permitted (type=value_error.extra)
tp_size
  extra fields not permitted (type=value_error.extra)
type
  extra fields not permitted (type=value_error.extra)
version
  extra fields not permitted (type=value_error.extra)

Is the usage wrong?

park12sj commented 12 months ago

@delock

Hello, it works fine on cuda.

The difference is that the config path was inserted as a checkpoint factor instead of the config factor, and the replace_with_kernel_inject was true.

ds_engine = deepspeed.init_inference(
        model,
        tensor_parallel={
            "tp_size": world_size,
        },
        dtype=torch_dtype,
        checkpoint=os.path.join(
            args.save_mp_checkpoint_path, "ds_inference_config.json"
        ),
        replace_with_kernel_inject=True
    )

The problem is that depending on the replace_with_kernel_inject, the logic will go on a completely different branch. https://github.com/microsoft/DeepSpeed/blob/581e44dd1ab3c409a5905335867c761d5cb4db5b/deepspeed/module_inject/replace_module.py#L301-L308

At this point, the checkpoint will cause a key error. Because the checkpoint in the config saved as deepspeed.init_inference is a dict consisting of two lists.

"checkpoints": {"non_tp": ["non-tp.pt"], "tp": ["tp_00_00.pt", "tp_01_00.pt", "tp_00_01.pt", "tp_01_01.pt", "tp_00_02.pt", "tp_01_02.pt", "tp_00_03.pt", "tp_01_03.pt", "tp_00_04.pt", "tp_01_04.pt", "tp_00_05.pt", "tp_01_05.pt", "tp_00_06.pt", "tp_01_06.pt", "tp_00_07.pt", "tp_01_07.pt"]}

This is why I put the config path value in the config factor. It's a config created by deepspeed, so I thought deepspeed could read it. But, both methods don't work, and there is an error even if we remove it.

Just in case, it didn't work even if I transformed it into a list of only 'tp' as below. checkpoint['checkpoints'] = [os.path.join(args.save_mp_checkpoint_path, tp) for tp in checkpoint['checkpoints']['tp']]

RuntimeError: The size of tensor a (78643200) must match the size of tensor b (39321600) at non-singleton dimension 0
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /workspace/storage/cephfs-personal/git/pai/ml/ml/model/application/nlp/place_lm/inference/accele │
│ ration/01_deepspeed/evaluate_bfloat16.py:99 in <module>                                          │
│                                                                                                  │
│    96 │   checkpoint['checkpoints'] = [os.path.join(args.save_mp_checkpoint_path, tp) for tp i   │
│    97 │   print(checkpoint)                                                                      │
│    98 │                                                                                          │
│ ❱  99 │   ds_engine = deepspeed.init_inference(                                                  │
│   100 │   │   model,                                                                             │
│   101 │   │   tensor_parallel={                                                                  │
│   102 │   │   │   "tp_size": world_size,                                                         │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/__init__.py:342 in init_inference   │
│                                                                                                  │
│   339 │                                                                                          │
│   340 │   ds_inference_config = DeepSpeedInferenceConfig(**config_dict)                          │
│   341 │                                                                                          │
│ ❱ 342 │   engine = InferenceEngine(model, config=ds_inference_config)                            │
│   343 │                                                                                          │
│   344 │   return engine                                                                          │
│   345                                                                                            │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/inference/engine.py:151 in __init__ │
│                                                                                                  │
│   148 │   │   │   │   │   │   config.injection_policy_tuple = (injection_policy, )               │
│   149 │   │   │   │   │   else:                                                                  │
│   150 │   │   │   │   │   │   config.injection_policy_tuple = injection_policy                   │
│ ❱ 151 │   │   │   │   │   self._apply_injection_policy(config, client_module)                    │
│   152 │   │                                                                                      │
│   153 │   │   device = get_accelerator().current_device_name()                                   │
│   154 │   │   self.module.to(device)                                                             │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/inference/engine.py:382 in          │
│ _apply_injection_policy                                                                          │
│                                                                                                  │
│   379 │   │                                                                                      │
│   380 │   │   if isinstance(self.module, torch.nn.Module):                                       │
│   381 │   │   │   # config is our DeepSpeedInferenceConfig and self.config is the HF model con   │
│ ❱ 382 │   │   │   replace_transformer_layer(client_module, self.module, checkpoint, config, se   │
│   383 │                                                                                          │
│   384 │   def _get_all_ckpt_names(self, checkpoints_path, tag):                                  │
│   385 │   │   ckpt_file_pattern = self._get_ckpt_name(checkpoints_path, tag, mp_placeholder="*   │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:304 │
│ in replace_transformer_layer                                                                     │
│                                                                                                  │
│   301 │   │   checkpoint = checkpoint_dict["checkpoints"]                                        │
│   302 │   │   pbar = tqdm.tqdm(total=len(checkpoint), desc=f"Loading {len(checkpoint)} checkpo   │
│   303 │   │   for i in range(len(checkpoint)):                                                   │
│ ❱ 304 │   │   │   replaced_module = replace_module(model=model,                                  │
│   305 │   │   │   │   │   │   │   │   │   │   │    orig_class=orig_layer_impl,                   │
│   306 │   │   │   │   │   │   │   │   │   │   │    replace_fn=replace_fn,                        │
│   307 │   │   │   │   │   │   │   │   │   │   │    _replace_policy=config.injection_policy_tup   │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:555 │
│ in replace_module                                                                                │
│                                                                                                  │
│   552 │   │   "No default policy found! Please specify your policy injection_policy (like {Ber   │
│   553 │   │   "You can find some samples here: https://github.com/microsoft/DeepSpeed/blob/mas   │
│   554 │                                                                                          │
│ ❱ 555 │   replaced_module, _ = _replace_module(model, policy, state_dict=sd)                     │
│   556 │   if checkpoint is not None:                                                             │
│   557 │   │   embedding_weight = None                                                            │
│   558 │   │   for n, p in replaced_module.named_parameters():                                    │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:623 │
│ in _replace_module                                                                               │
│                                                                                                  │
│   620 │   │   │   │   │   continue                                                               │
│   621 │   │   │   if len(child._buffers) != 0 and state_dict is not None:                        │
│   622 │   │   │   │   Loading.load_buffer(child, state_dict, checking_key)                       │
│ ❱ 623 │   │   │   _, layer_id = _replace_module(child,                                           │
│   624 │   │   │   │   │   │   │   │   │   │     policies,                                        │
│   625 │   │   │   │   │   │   │   │   │   │     prefix if level_id == 0 and skip_level_0_prefi   │
│   626 │   │   │   │   │   │   │   │   │   │     prefix + name + '.',                             │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:623 │
│ in _replace_module                                                                               │
│                                                                                                  │
│   620 │   │   │   │   │   continue                                                               │
│   621 │   │   │   if len(child._buffers) != 0 and state_dict is not None:                        │
│   622 │   │   │   │   Loading.load_buffer(child, state_dict, checking_key)                       │
│ ❱ 623 │   │   │   _, layer_id = _replace_module(child,                                           │
│   624 │   │   │   │   │   │   │   │   │   │     policies,                                        │
│   625 │   │   │   │   │   │   │   │   │   │     prefix if level_id == 0 and skip_level_0_prefi   │
│   626 │   │   │   │   │   │   │   │   │   │     prefix + name + '.',                             │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:599 │
│ in _replace_module                                                                               │
│                                                                                                  │
│   596 │   """                                                                                    │
│   597 │   for name, child in model.named_children():                                             │
│   598 │   │   if child.__class__ in policies:                                                    │
│ ❱ 599 │   │   │   replaced_module = policies[child.__class__][0](child,                          │
│   600 │   │   │   │   │   │   │   │   │   │   │   │   │   │      policies[child.__class__][-1]   │
│   601 │   │   │   │   │   │   │   │   │   │   │   │   │   │      layer_id,                       │
│   602 │   │   │   │   │   │   │   │   │   │   │   │   │   │      prefix=prefix + name,           │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:295 │
│ in replace_fn                                                                                    │
│                                                                                                  │
│   292 │   │   │   │   │   │   │   │   │   │   │   │    inference=True,                           │
│   293 │   │   │   │   │   │   │   │   │   │   │   │    layer_id=layer_id)                        │
│   294 │   │   │   else:                                                                          │
│ ❱ 295 │   │   │   │   new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict   │
│   296 │   │                                                                                      │
│   297 │   │   return new_module                                                                  │
│   298                                                                                            │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py:278 │
│ in replace_wo_policy                                                                             │
│                                                                                                  │
│   275 │   │   _autotp.update_linear_policies()                                                   │
│   276 │   │                                                                                      │
│   277 │   │   # 4. Replace modules                                                               │
│ ❱ 278 │   │   return _autotp._replace_module(module)                                             │
│   279 │                                                                                          │
│   280 │   def replace_fn(child, _policy, layer_id=0, prefix="", state_dict=None):                │
│   281 │   │   training = False  # todo: refactor this part to go in the config                   │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/auto_tp.py:430 in     │
│ _replace_module                                                                                  │
│                                                                                                  │
│   427 │   │   │   │   │   │   │   │   │   │   │   │   │   │   │   │     self.conv_linear_layer   │
│   428 │   │   │   else:                                                                          │
│   429 │   │   │   │   self.update_mp_params(child)                                               │
│ ❱ 430 │   │   │   │   self._replace_module(child, name, class_name)                              │
│   431 │   │   return r_module                                                                    │
│   432                                                                                            │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/auto_tp.py:409 in     │
│ _replace_module                                                                                  │
│                                                                                                  │
│   406 │   │   │   checking_key = self.prefix + '.' + class_name + '.' + name + '.' if class_na   │
│   407 │   │   │   if Loading.is_load_module(child) and self.state_dict is not None:              │
│   408 │   │   │   │   if any(checking_key in item for item in self.state_dict):                  │
│ ❱ 409 │   │   │   │   │   Loading.load(child, self.state_dict, checking_key, self.mp_group)      │
│   410 │   │   │   │   else:                                                                      │
│   411 │   │   │   │   │   continue                                                               │
│   412 │   │   │   if len(child._buffers) != 0 and self.state_dict is not None:                   │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/auto_tp.py:139 in     │
│ load                                                                                             │
│                                                                                                  │
│   136 │   │   │   │   module.weight = torch.nn.parameter.Parameter(data=torch.empty_like(modul   │
│   137 │   │   │   │   │   │   │   │   │   │   │   │   │   │   │    requires_grad=module.weight   │
│   138 │   │   │   │   if 'query_key_value' in prefix:                                            │
│ ❱ 139 │   │   │   │   │   module.weight = mp_replace.strided_copy(module.weight.data,            │
│   140 │   │   │   │   │   │   │   │   │   │   │   │   │   │   │   state_dict[prefix + 'weight'   │
│   141 │   │   │   │   │   │   │   │   │   │   │   │   │   │   │   num_splits=3)                  │
│   142 │   │   │   │   else:                                                                      │
│                                                                                                  │
│ /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/module_inject/auto_tp.py:55 in      │
│ strided_copy                                                                                     │
│                                                                                                  │
│    52 │   │   src_split = torch.split(src.data, src.shape[outer_dim] // num_splits, dim=outer_   │
│    53 │   │   if (len(src_shape) == 2 and len(dst_shape) == 2):                                  │
│    54 │   │   │   if src_shape[outer_dim] == dst_shape[self.out_dim]:                            │
│ ❱  55 │   │   │   │   dst = dst.reshape(-1).data.copy_(src.data.reshape(-1)).reshape(src.shape   │
│    56 │   │   │   │   dst = torch.nn.parameter.Parameter(dst, requires_grad=False)               │
│    57 │   │   │   │   if hasattr(src, 'scale'):                                                  │
│    58 │   │   │   │   │   dst.scale = src.scale                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: The size of tensor a (78643200) must match the size of tensor b (39321600) at non-singleton dimension 0

park12sj commented 12 months ago

After saving the config, we did not load it again, but we loaded the weight directly from the model and operated it.

thejumpman2323 commented 9 months ago

I solved it by updating the gcc version

how did you resolve I updated my gcc to 13, still same issue

park12sj commented 8 months ago

@thejumpman2323

I don't know the difference because it was solved with version 11 update

I think you're env misread the gcc path. Can you put CC=$(which gcc) GXX=$(which g++) in front of the command?