llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.53k stars 11.79k forks source link

[CUDA, NVPTX] LLVM crash with "Cannot cast between two non-generic address spaces" #112760

Open Artem-B opened 3 hours ago

Artem-B commented 3 hours ago

Valid CUDA code results in LLVM crash due to an attempt to generate an impossible addrspacecast in a known-false conditional branch.

Reproducer: https://godbolt.org/z/Yjjsdvj1r

Source:

#include <stdint.h>

// No crash if the function is not inlined, and thus does not know 
// at compile time which pointer it will be handling
//__device__ uintptr_t f(void *p) __noinline__;

__device__ uintptr_t f(void *p) {
   if (__isGlobal(p))
    return __cvta_generic_to_global(p);
   if (__isShared(p))
     return __cvta_generic_to_shared(p);
   return (uintptr_t)p;
}

__shared__ int shared_data;
__device__ int global_data;
__constant__ int const_data = 3;

__global__ void square(uintptr_t* out, int n) {
  out[0] = f(&shared_data);
  out[1] = f(&global_data);
}

IR:

define dso_local void @square(unsigned long*, int)(ptr nocapture noundef writeonly %out, i32 noundef %n) local_unnamed_addr #1 {
entry:
  %0 = tail call i1 @llvm.nvvm.isspacep.global(ptr addrspacecast (ptr addrspace(3) @shared_data to ptr))
  %1 = tail call i1 @llvm.nvvm.isspacep.shared(ptr addrspacecast (ptr addrspace(3) @shared_data to ptr))
  %. = select i1 %1, i64 ptrtoint (ptr addrspace(3) @shared_data to i64), i64 ptrtoint (ptr addrspacecast (ptr addrspace(3) @shared_data to ptr) to i64)
  %retval.0.i = select i1 %0, i64 ptrtoint (ptr addrspace(1) addrspacecast (ptr addrspace(3) @shared_data to ptr addrspace(1)) to i64), i64 %.
  store i64 %retval.0.i, ptr %out, align 8, !tbaa !8
  ret void
}

The culprit is addrspacecast (ptr addrspace(3) @shared_data to ptr addrspace(1)) here:

  %retval.0.i = select i1 %0, i64 ptrtoint (ptr addrspace(1) addrspacecast (ptr addrspace(3) @shared_data to ptr addrspace(1)) to i64), 

To add insult to injury, the impossible cast is still going to be executed, and will likely result in a runtime error trying to convert the pointer in the wrong address space.

We need to make sure that __cvta_generic_to_global() is never executed if __isGlobal() is false. Same for the conversions from shared and constant AS.

Artem-B commented 2 hours ago

@darkbuck I want to replace impossible ASCs with undef, and I see that in https://github.com/llvm/llvm-project/commit/72fc08a5412ec7ee7f0b904926db16cd86c1f876 you've added isValidAddrSpaceCast which could be used for this purpose. Except, that, for some reason it defaults to false which would flag all ASCs on all the back-ends that do not implement this hook. Is there a particular reason false was used? Can it be changed to true ? I will implement the correct hook for NVPTX, but I'm concerned about incorrectly eliminating ASCs on the targets that didn't implement the hook.