Open Xzensi opened 2 months ago
Am I safe to assume that DeepSpeed does not yet support ROCm 6.0? A whole lot of errors during JIT build of transformer_inference.
$ pip show torch Name: torch Version: 2.3.0+rocm6.0
HIPCC call arguments:
[1/5] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -isystem /home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/torch/include -isystem /home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/torch/include/TH -isystem /home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/torch/include/THC -isystem /home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /home/nexus/.conda/envs/xtts/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=6 -DROCM_VERSION_MINOR=0 --offload-arch=gfx1100 -fno-gpu-rdc -c /home/nexus/.conda/envs/xtts/lib/python3.11/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip -o apply_rotary_pos_emb.cuda.o
FAILED: apply_rotary_pos_emb.cuda.o
fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated when compiling for gfx1100
FAILED: rms_norm.cuda.o
1 warning and 16 errors generated when compiling for gfx1100.
FAILED: layer_norm.cuda.o
FAILED: pt_binding_hip.o
...
CoquiEngine: Error initializing main coqui engine model: Error building extension 'transformer_inference'
FYI @rraminen and @jithunnair-amd
Am I safe to assume that DeepSpeed does not yet support ROCm 6.0? A whole lot of errors during JIT build of transformer_inference.
HIPCC call arguments:
FAILED: apply_rotary_pos_emb.cuda.o
FAILED: rms_norm.cuda.o
FAILED: layer_norm.cuda.o
FAILED: pt_binding_hip.o
CoquiEngine: Error initializing main coqui engine model: Error building extension 'transformer_inference'