microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
33.66k stars 3.95k forks source link

JIT build fails for ROCM 6.0 #5474

Open Xzensi opened 2 months ago

Xzensi commented 2 months ago

Am I safe to assume that DeepSpeed does not yet support ROCm 6.0? A whole lot of errors during JIT build of transformer_inference.

$ pip show torch
Name: torch
Version: 2.3.0+rocm6.0

HIPCC call arguments:

[1/5] /opt/rocm/bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -isystem /home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/torch/include -isystem /home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/torch/include/TH -isystem /home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/torch/include/THC -isystem /home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /home/nexus/.conda/envs/xtts/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=6 -DROCM_VERSION_MINOR=0 --offload-arch=gfx1100 -fno-gpu-rdc -c /home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip -o apply_rotary_pos_emb.cuda.o

FAILED: apply_rotary_pos_emb.cuda.o

fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated when compiling for gfx1100

FAILED: rms_norm.cuda.o

1 warning and 16 errors generated when compiling for gfx1100.

FAILED: layer_norm.cuda.o

1 warning and 16 errors generated when compiling for gfx1100.

FAILED: pt_binding_hip.o

...

CoquiEngine: Error initializing main coqui engine model: Error building extension 'transformer_inference'

loadams commented 2 months ago

FYI @rraminen and @jithunnair-amd