Add support in KALDI for AMD GPUs by hipifying the existing CUDA implementation.
The implementation comprises the following:
Add the option --use-rocm and --rocm-targets= to the configuration script.
Add hip_64bit.mk build file to add the relevant HIP/ROCm build flags. IS_GPU_BUILD, CUDA and ROCM variables guard sections meant to any GPU build, CUDA only and ROCm only, respectively.
Include hipify.h header to control the "hipification" of the code allowing to keep existing CUDA implementation largely untouched by mapping different APIs and variables to its HIP counterparts.
The source code uses __IS_HIP_COMPILE__ to guard HIP specific sessions.
The AMD GPUs arch is being feature-mapped to CUDA 8.0 compute capability which is largely true and sufficient for the code.
Replace hardcoded warp/wavefront sizes and thread block limits so they can be configured properly to the target GPU.
ROCm builds can be completed with (assuming existing python environment with requirements), e.g.:
Add support in KALDI for AMD GPUs by hipifying the existing CUDA implementation.
The implementation comprises the following:
--use-rocm
and--rocm-targets=
to the configuration script.IS_GPU_BUILD
,CUDA
andROCM
variables guard sections meant to any GPU build, CUDA only and ROCm only, respectively.__IS_HIP_COMPILE__
to guard HIP specific sessions.ROCm builds can be completed with (assuming existing python environment with requirements), e.g.:
This PR was tested successfully on AMD MI250x and NVIDIA A100 hardware.