Open ramyaprabhu-alt opened 1 year ago
@ramyaprabhu-alt can you please copy and paste the code you are running to reproduce the bug so I don't have to re-write it? Thanks
from deepspeed.ops.transformer.inference.config import DeepSpeedInferenceConfig
from deepspeed.model_implementations.transformers.ds_transformer import DeepSpeedTransformerInference
import torch
import deepspeed
config = DeepSpeedInferenceConfig(
hidden_size=5,
intermediate_size = 20,
heads=1,
dtype=torch.float32,
pre_layer_norm = False
)
model = DeepSpeedTransformerInference(config=config)
from numpy import random
x = random.randint(100, size=(1,1,5))
print(x)
model(torch.Tensor(x))
print(deepspeed.__version__)
I just ran the reproducer you share and I'm unable to replicate this error (using latest DeepSpeed, CUDA 11.8, Torch 2.0, A6000 GPU). Could you share the output of ds_report
? Thank you
I have the same issue on A100 80GB driver version 535.104.12, CUDA 11.7, Torch 1.13.1, deepspeed built from master.
I ran the same script like this: CUDA_LAUNCH_BLOCKING=1 python test.py
My ds_report:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [YES] ...... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ilya_vologin/llama_deepspeed/venv/lib/python3.8/site-packages/torch']
torch version .................... 1.13.1+cu117
deepspeed install path ........... ['/home/ilya_vologin/llama_deepspeed/venv/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.10.4+f8d3ec7f, f8d3ec7f, master
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.7
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7
shared memory (/dev/shm) size .... 83.53 GB
Output:
[2023-09-25 17:46:41,742] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 5, 'intermediate_size': 20, 'heads': 1, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': False, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-12, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 256, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1}
[[[59 99 69 82 61]]]
------------------------------------------------------
Free memory : 78.584106 (GigaBytes)
Total memory: 79.151001 (GigaBytes)
Requested memory: 0.005371 (GigaBytes)
Setting maximum total tokens (input + output) to 1024
WorkSpace: 0x7f3631c00000
------------------------------------------------------
!!!! kernel execution error. (m: 15, n: 1, k: 5, error: 13)
!!!! kernel execution error. (batch: 1, m: 1, n: 1, k: 5, error: 13)
!!!! kernel execution error. (batch: 1, m: 5, n: 1, k: 1, error: 13)
!!!! kernel execution error. (m: 5, n: 1, k: 5, error: 13)
!!!! kernel execution error. (m: 20, n: 1, k: 5, error: 13)
!!!! kernel execution error. (m: 5, n: 1, k: 20, error: 13)
0.10.4+f8d3ec7f
I am able to replicate the error now. I needed to add CUDA_LAUNCH_BLOCKING=1
otherwise I did not see the kernel execution error. It seems this error is happening in ds_linear_layer
:
https://github.com/microsoft/DeepSpeed/blob/0636c74c5e27757d48f64f33f330d7bb975fc5a8/csrc/transformer/inference/csrc/pt_binding.cpp#L1097
@RezaYazdaniAminabadi any ideas?
I encountered the same issue on V100. What should i do to solve the preblem? Thank you for your help
Describe the bug
I was trying to run the above given script and I run into this error:
I don't know how to even start debugging to understand where the problem is
To Reproduce Steps to reproduce the behavior: just run the code in the first screenshot. And no changes were made to DS
System info (please complete the following information):