For math functions __clang_cuda_math_forward_declares.h exists and adds definitions for all math functions. But for stdlib functions there is no such file.
Adding extern __attribute__((device)) void* memcpy(void* dst, const void* src, size_t size); before <algorithm> in __clang_hip_runtime_wrapper.h solves the issue.
In general it looks like a new file like __clang_cuda_stdlib_forward_declares.h with memcpy and memset __device__ declarations could solve the issue without breaking anything.
This code (seen in pytorch) compiles with stdlibc++, but fails with libc++:
Fails with
error: reference to __host__ function 'memcpy' in __device__ function
, see https://godbolt.org/z/h5nEnbb68The issue lies between these lines:
https://github.com/llvm/llvm-project/blob/c80c09f3e380a0a2b00b36bebf72f43271a564c1/clang/lib/Headers/__clang_hip_runtime_wrapper.h#L142-L145
For math functions __clang_cuda_math_forward_declares.h exists and adds definitions for all math functions. But for stdlib functions there is no such file.
What happens in stdlibc++:
In libc++ part 2 and part 3 are swapped, because
#include <algorithm>
in libc++ includes<.../c++/v1/cstring>
, which results inAdding
extern __attribute__((device)) void* memcpy(void* dst, const void* src, size_t size);
before<algorithm>
in__clang_hip_runtime_wrapper.h
solves the issue.In general it looks like a new file like
__clang_cuda_stdlib_forward_declares.h
with memcpy and memset__device__
declarations could solve the issue without breaking anything.