microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
34k stars 3.98k forks source link

[BUG] deepspeed inferene opt/66b model OOM on 8GPUs #2933

Open lambda7xx opened 1 year ago

lambda7xx commented 1 year ago

My deepspeed is 0.8.1 and transformers is 4.21.2 and I have 8 V100 32GB on my machine

Loading 14 checkpoint shards: 0%| | 0/14 [00:00<?, ?it/s] Loading 14 checkpoint shards: 0%| | 0/14 [00:00<?, ?it/s] Loading 14 checkpoint shards: 0%| | 0/14 [00:00<?, ?it/s] Loading 14 checkpoint shards: 0%| | 0/14 [00:00<?, ?it/s] Loading 14 checkpoint shards: 0%| | 0/14 [00:00<?, ?it/s] Loading 14 checkpoint shards: 0%| | 0/14 [00:00<?, ?it/s] Loading 14 checkpoint shards: 0%| | 0/14 [00:00<?, ?it/s] Loading 14 checkpoint shards: 0%| | 0/14 [00:00<?, ?it/s]Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 561, in replace_transformer_layer load_model_with_checkpoint(replaced_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 279, in load_model_with_checkpoint load_module_recursive(r_module) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 271, in load_module_recursive layer_policies[child.class](child, prefix + name + '.') File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 200, in load_transformer_layer replace_policy.load_params(module, AttributeError: 'HFOPTLayerPolicy' object has no attribute 'load_params' Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 561, in replace_transformer_layer load_model_with_checkpoint(replaced_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 279, in load_model_with_checkpoint load_module_recursive(r_module) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 271, in load_module_recursive layer_policies[child.class](child, prefix + name + '.') File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 200, in load_transformer_layer replace_policy.load_params(module, AttributeError: 'HFOPTLayerPolicy' object has no attribute 'load_params'

Loading 14 checkpoint shards: 0%| | 0/14 [00:05<?, ?it/s] Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 561, in replace_transformer_layer load_model_with_checkpoint(replaced_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 279, in load_model_with_checkpoint load_module_recursive(r_module) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 271, in load_module_recursive layer_policies[child.class](child, prefix + name + '.') File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 200, in load_transformer_layer replace_policy.load_params(module, AttributeError: 'HFOPTLayerPolicy' object has no attribute 'load_params' Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 561, in replace_transformer_layer load_model_with_checkpoint(replaced_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 279, in load_model_with_checkpoint load_module_recursive(r_module) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 271, in load_module_recursive layer_policies[child.class](child, prefix + name + '.') File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 200, in load_transformer_layer replace_policy.load_params(module, AttributeError: 'HFOPTLayerPolicy' object has no attribute 'load_params'

Loading 14 checkpoint shards: 0%| | 0/14 [00:05<?, ?it/s]

Loading 14 checkpoint shards: 0%| | 0/14 [00:05<?, ?it/s]

Loading 14 checkpoint shards: 0%| | 0/14 [00:05<?, ?it/s] Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 561, in replace_transformer_layer load_model_with_checkpoint(replaced_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 279, in load_model_with_checkpoint load_module_recursive(r_module) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 271, in load_module_recursive layer_policies[child.class](child, prefix + name + '.') File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 200, in load_transformer_layer replace_policy.load_params(module, AttributeError: 'HFOPTLayerPolicy' object has no attribute 'load_params' Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 561, in replace_transformer_layer load_model_with_checkpoint(replaced_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 279, in load_model_with_checkpoint load_module_recursive(r_module) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 271, in load_module_recursive layer_policies[child.class](child, prefix + name + '.') File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 200, in load_transformer_layer replace_policy.load_params(module, AttributeError: 'HFOPTLayerPolicy' object has no attribute 'load_params'

Loading 14 checkpoint shards: 0%| | 0/14 [00:06<?, ?it/s]

Loading 14 checkpoint shards: 0%| | 0/14 [00:06<?, ?it/s] PHLRR4036:4558:5025 [4] NCCL INFO [Service thread] Connection closed by localRank 4 Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 561, in replace_transformer_layer load_model_with_checkpoint(replaced_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 279, in load_model_with_checkpoint load_module_recursive(r_module) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 271, in load_module_recursive layer_policies[child.class](child, prefix + name + '.') File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 200, in load_transformer_layer replace_policy.load_params(module, AttributeError: 'HFOPTLayerPolicy' object has no attribute 'load_params' PHLRR4036:4558:4558 [4] NCCL INFO comm 0x4a158970 rank 4 nranks 8 cudaDev 4 busId 83000 - Abort COMPLETE Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 561, in replace_transformer_layer load_model_with_checkpoint(replaced_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 279, in load_model_with_checkpoint load_module_recursive(r_module) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 273, in load_module_recursive load_module_recursive( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 271, in load_module_recursive layer_policies[child.class](child, prefix + name + '.') File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 200, in load_transformer_layer replace_policy.load_params(module, AttributeError: 'HFOPTLayerPolicy' object has no attribute 'load_params' PHLRR4036:4556:5028 [3] NCCL INFO [Service thread] Connection closed by localRank 3 PHLRR4036:4556:4556 [3] NCCL INFO comm 0x4a9c1610 rank 3 nranks 8 cudaDev 3 busId 13000 - Abort COMPLETE

Loading 14 checkpoint shards: 0%| | 0/14 [00:06<?, ?it/s] PHLRR4036:4560:5029 [5] NCCL INFO [Service thread] Connection closed by localRank 5 PHLRR4036:4560:4560 [5] NCCL INFO comm 0x498b20b0 rank 5 nranks 8 cudaDev 5 busId 89000 - Abort COMPLETE

Loading 14 checkpoint shards: 0%| | 0/14 [00:06<?, ?it/s] PHLRR4036:4553:5017 [0] NCCL INFO [Service thread] Connection closed by localRank 0 PHLRR4036:4553:4553 [0] NCCL INFO comm 0x4b092180 rank 0 nranks 8 cudaDev 0 busId 5000 - Abort COMPLETE PHLRR4036:4554:5024 [1] NCCL INFO [Service thread] Connection closed by localRank 1 PHLRR4036:4554:4554 [1] NCCL INFO comm 0x4a1b1a70 rank 1 nranks 8 cudaDev 1 busId 8000 - Abort COMPLETE PHLRR4036:4564:5026 [7] NCCL INFO [Service thread] Connection closed by localRank 7 PHLRR4036:4564:4564 [7] NCCL INFO comm 0x4a6a3370 rank 7 nranks 8 cudaDev 7 busId 91000 - Abort COMPLETE PHLRR4036:4562:5027 [6] NCCL INFO [Service thread] Connection closed by localRank 6 PHLRR4036:4555:5023 [2] NCCL INFO [Service thread] Connection closed by localRank 2 PHLRR4036:4562:4562 [6] NCCL INFO comm 0x4944deb0 rank 6 nranks 8 cudaDev 6 busId 8e000 - Abort COMPLETE PHLRR4036:4555:4555 [2] NCCL INFO comm 0x47643180 rank 2 nranks 8 cudaDev 2 busId d000 - Abort COMPLETE [2023-03-03 03:41:22,399] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 4553 [2023-03-03 03:41:22,433] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 4554 [2023-03-03 03:41:23,542] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 4555 [2023-03-03 03:41:23,839] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 4556 [2023-03-03 03:41:23,842] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 4558 [2023-03-03 03:41:23,843] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 4560 [2023-03-03 03:41:23,846] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 4562 [2023-03-03 03:41:23,848] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 4564 [2023-03-03 03:41:23,850] [ERROR] [launch.py:324:sigkill_handler] ['/usr/bin/python3', '-u', 'bloom-inference-scripts/bloom-ds-inference.py', '--local_rank=7', '--name', 'facebook/opt-66b', '--batch_size', '4', '--tp_size', '4', '--benchmark'] exits with return code = 1

lambda7xx commented 1 year ago

Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/lambda7xx/.cache/torch_extensions/py38_cu117/transformer_inference/build.ninja... Building extension module transformer_inference... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module transformer_inference... Loading extension module transformer_inference... Loading extension module transformer_inference... Loading extension module transformer_inference... Time to load transformer_inference op: 0.5659897327423096 seconds Time to load transformer_inference op: 0.5149312019348145 seconds Time to load transformer_inference op: 0.5152051448822021 seconds [2023-03-03 03:40:39,841] [INFO] [logging.py:75:log_dist] [Rank 0] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 9216, 'intermediate_size': 36864, 'heads': 72, 'num_hidden_layers': -1, 'fp16': True, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-12, 'mp_size': 2, 'q_int8': False, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.ReLU: 2>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False} Time to load transformer_inference op: 0.5122606754302979 seconds Loading extension module transformer_inference... Loading extension module transformer_inference... Time to load transformer_inference op: 0.5149903297424316 seconds Time to load transformer_inference op: 0.5106439590454102 seconds Loading extension module transformer_inference... Time to load transformer_inference op: 0.6110687255859375 seconds Loading extension module transformer_inference... Time to load transformer_inference op: 0.6097698211669922 seconds Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.09874415397644043 seconds Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.10198044776916504 seconds Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.10840225219726562 seconds Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.11503076553344727 seconds Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.10785055160522461 seconds Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.10958147048950195 seconds Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.10695767402648926 seconds Using /home/lambda7xx/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.11722230911254883 seconds Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 532, in replace_transformer_layer replaced_module = replace_module(model=model, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 797, in replace_module replacedmodule, = _replace_module(model, policy) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 814, in _replace_module replaced_module = policies[child.class][0](child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 522, in replace_fn new_module = replace_with_policy(child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 383, in replace_with_policy _container.create_module() File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/containers/opt.py", line 21, in create_module self.module = DeepSpeedOPTInference(_config, mp_group=self.mp_group) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_opt.py", line 18, in init super().init(config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 70, in init self.mlp = DeepSpeedMLP(self.config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_mlp.py", line 45, in init self.output_w = nn.Parameter(torch.empty(intm_size_per_partition, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 324.00 MiB (GPU 0; 31.75 GiB total capacity; 31.08 GiB already allocated; 216.50 MiB free; 31.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 532, in replace_transformer_layer replaced_module = replace_module(model=model, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 797, in replace_module replacedmodule, = _replace_module(model, policy) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 814, in _replace_module replaced_module = policies[child.class][0](child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 522, in replace_fn new_module = replace_with_policy(child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 383, in replace_with_policy _container.create_module() File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/containers/opt.py", line 21, in create_module self.module = DeepSpeedOPTInference(_config, mp_group=self.mp_group) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_opt.py", line 18, in init super().init(config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 70, in init self.mlp = DeepSpeedMLP(self.config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_mlp.py", line 45, in init self.output_w = nn.Parameter(torch.empty(intm_size_per_partition, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 324.00 MiB (GPU 4; 31.75 GiB total capacity; 31.08 GiB already allocated; 216.50 MiB free; 31.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference model = deepspeed.init_inference(
engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 532, in replace_transformer_layer self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 532, in replace_transformer_layer replaced_module = replace_module(model=model, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 797, in replace_module replaced_module = replace_module(model=model, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 797, in replace_module replacedmodule, = _replace_module(model, policy) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replace_module replacedmodule, = _replace_module(model, policy) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 814, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 814, in _replace_module replaced_module = policies[child.class][0](child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 522, in replace_fn new_module = replace_with_policy(child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 383, in replace_with_policy replaced_module = policies[child.class][0](child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 522, in replace_fn _container.create_module() File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/containers/opt.py", line 21, in create_module new_module = replace_with_policy(child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 383, in replace_with_policy self.module = DeepSpeedOPTInference(_config, mp_group=self.mp_group) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_opt.py", line 18, in init super().init(config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 70, in init _container.create_module() File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/containers/opt.py", line 21, in create_module self.mlp = DeepSpeedMLP(self.config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_mlp.py", line 45, in init self.module = DeepSpeedOPTInference(_config, mp_group=self.mp_group) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_opt.py", line 18, in init self.output_w = nn.Parameter(torch.empty(intm_size_per_partition, torch.cuda .super().init(config,OutOfMemoryError : File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 70, in init CUDA out of memory. Tried to allocate 324.00 MiB (GPU 3; 31.75 GiB total capacity; 31.08 GiB already allocated; 216.50 MiB free; 31.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF self.mlp = DeepSpeedMLP(self.config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_mlp.py", line 45, in init self.output_w = nn.Parameter(torch.empty(intm_size_per_partition, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 324.00 MiB (GPU 7; 31.75 GiB total capacity; 31.08 GiB already allocated; 216.50 MiB free; 31.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 532, in replace_transformer_layer replaced_module = replace_module(model=model, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 797, in replace_module replacedmodule, = _replace_module(model, policy) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 814, in _replace_module replaced_module = policies[child.class][0](child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 522, in replace_fn new_module = replace_with_policy(child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 383, in replace_with_policy _container.create_module() File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/containers/opt.py", line 21, in create_module self.module = DeepSpeedOPTInference(_config, mp_group=self.mp_group) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_opt.py", line 18, in init super().init(config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 70, in init self.mlp = DeepSpeedMLP(self.config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_mlp.py", line 45, in init self.output_w = nn.Parameter(torch.empty(intm_size_per_partition, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 324.00 MiB (GPU 2; 31.75 GiB total capacity; 31.08 GiB already allocated; 216.50 MiB free; 31.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 532, in replace_transformer_layer replaced_module = replace_module(model=model, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 797, in replace_module replacedmodule, = _replace_module(model, policy) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 814, in _replace_module replaced_module = policies[child.class][0](child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 522, in replace_fn new_module = replace_with_policy(child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 383, in replace_with_policy _container.create_module() File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/containers/opt.py", line 21, in create_module self.module = DeepSpeedOPTInference(_config, mp_group=self.mp_group) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_opt.py", line 18, in init super().init(config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 70, in init self.mlp = DeepSpeedMLP(self.config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_mlp.py", line 45, in init self.output_w = nn.Parameter(torch.empty(intm_size_per_partition, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 324.00 MiB (GPU 5; 31.75 GiB total capacity; 31.08 GiB already allocated; 216.50 MiB free; 31.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 532, in replace_transformer_layer replaced_module = replace_module(model=model, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 797, in replace_module replacedmodule, = _replace_module(model, policy) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 814, in _replace_module replaced_module = policies[child.class][0](child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 522, in replace_fn new_module = replace_with_policy(child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 383, in replace_with_policy _container.create_module() File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/containers/opt.py", line 21, in create_module self.module = DeepSpeedOPTInference(_config, mp_group=self.mp_group) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_opt.py", line 18, in init super().init(config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 70, in init self.mlp = DeepSpeedMLP(self.config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_mlp.py", line 45, in init self.output_w = nn.Parameter(torch.empty(intm_size_per_partition, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 324.00 MiB (GPU 6; 31.75 GiB total capacity; 31.08 GiB already allocated; 216.50 MiB free; 31.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-inference.py", line 185, in model = deepspeed.init_inference( File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/init.py", line 311, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 134, in init self._apply_injection_policy(config) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 358, in _apply_injection_policy replace_transformer_layer(client_module, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 532, in replace_transformer_layer replaced_module = replace_module(model=model, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 797, in replace_module replacedmodule, = _replace_module(model, policy) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 824, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 814, in _replace_module replaced_module = policies[child.class][0](child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 522, in replace_fn new_module = replace_with_policy(child, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 383, in replace_with_policy _container.create_module() File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/module_inject/containers/opt.py", line 21, in create_module self.module = DeepSpeedOPTInference(_config, mp_group=self.mp_group) File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_opt.py", line 18, in init super().init(config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 70, in init self.mlp = DeepSpeedMLP(self.config, File "/home/lambda7xx/.local/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_mlp.py", line 45, in init self.output_w = nn.Parameter(torch.empty(intm_size_per_partition, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 324.00 MiB (GPU 1; 31.75 GiB total capacity; 31.08 GiB already allocated; 216.50 MiB free; 31.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF PHLRR4036:3531:3986 [0] NCCL INFO [Service thread] Connection closed by localRank 0


- opt-66b model should be about 132GB memory and I have 256GB GPU memory. I think it should not OOM