Closed corey-derochie-amd closed 1 month ago
Hi @corey-derochie-amd, the team has investigated this from before, and it is very tricky to tackle from the mscclpp's side. We rather use this ROCm patch for include/hip/amd_detail/amd_hip_bf16.h
to avoid this issue on ROCm 6.0.
97c97
< #define __HOST_DEVICE__ __device__
---
> #define __HOST_DEVICE__ __device__ static
100c100
< #define __HOST_DEVICE__ __host__ __device__
---
> #define __HOST_DEVICE__ __host__ __device__ static inline
This is already adopted in ROCm 6.1.
Thanks, @chhwang .
While commit 72b99a42291fcd6c5dcde694fcb3c5d72bc0c9c7 allows libmscclpp to compile using ROCm 6.0, there are still linker errors in libmscclpp_nccl:
This does not appear to be an issue with later versions of ROCm.