llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.92k stars 11.52k forks source link

[libc++] std::memcpy does not work in HIP __device__ code #100802

Open AngryLoki opened 1 month ago

AngryLoki commented 1 month ago

This code (seen in pytorch) compiles with stdlibc++, but fails with libc++:

// hipcc -c test.cpp -o test.o
#include <cstring>

__device__ void test1( void* dest, const void* src, std::size_t count ) {
    // this fails
    std::memcpy( dest, src, count );

    // but this works
    memcpy( dest, src, count );
}

Fails with error: reference to __host__ function 'memcpy' in __device__ function, see https://godbolt.org/z/h5nEnbb68

The issue lies between these lines:

https://github.com/llvm/llvm-project/blob/c80c09f3e380a0a2b00b36bebf72f43271a564c1/clang/lib/Headers/__clang_hip_runtime_wrapper.h#L142-L145

For math functions __clang_cuda_math_forward_declares.h exists and adds definitions for all math functions. But for stdlib functions there is no such file.

What happens in stdlibc++:

// Part1: /usr/include/string.h
extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
       size_t __n) noexcept (true) __attribute__ ((__nonnull__ (1, 2)));

// Part 2: "/opt/compiler-explorer/libs/rocm/6.1.2/include/hip/amd_detail/amd_device_functions.h"
static inline __attribute__((device)) void* memcpy(void* dst, const void* src, size_t size) {
    return __hip_hc_memcpy(dst, src, size);
}

// Part 3: /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/cstring
namespace std {
  using ::memchr;
  using ::memcmp;
  using ::memcpy;
...

In libc++ part 2 and part 3 are swapped, because #include <algorithm> in libc++ includes <.../c++/v1/cstring>, which results in

// /opt/compiler-explorer/clang-rocm-6.1.2/bin/../include/c++/v1/cstring
namespace std { inline namespace __1 {
using ::size_t __attribute__((__using_if_exists__));
using ::memcpy __attribute__((__using_if_exists__));
using ::memmove __attribute__((__using_if_exists__));
...

// /opt/compiler-explorer/libs/rocm/6.1.2/include/hip/amd_detail/amd_device_functions.h
static inline __attribute__((device)) void* memcpy(void* dst, const void* src, size_t size) {
    return __hip_hc_memcpy(dst, src, size);
}

Adding extern __attribute__((device)) void* memcpy(void* dst, const void* src, size_t size); before <algorithm> in __clang_hip_runtime_wrapper.h solves the issue.

In general it looks like a new file like __clang_cuda_stdlib_forward_declares.h with memcpy and memset __device__ declarations could solve the issue without breaking anything.

AngryLoki commented 1 month ago

@yxsamliu, could you check this, please?