llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.8k stars 11.9k forks source link

[NVPTX] Invalid PTX when initializing struct from shared-memory pointer #49660

Open llvmbot opened 3 years ago

llvmbot commented 3 years ago
Bugzilla Link 50316
Version 11.0
OS Linux
Reporter LLVM Bugzilla Contributor
CC @happybaoliang,@Artem-B

Extended Description

Aggregate-initializing a struct from an immediate pointer-to-shared-memory produces invalid PTX:

struct wrap { int *mem; };

__global__ void kernel() {
    __shared__ int mem;
    wrap ptr{&mem};
}

Compiled with

clang++ -c bug.cu --cuda-gpu-arch=sm_75 -std=gnu++17 -U__FLOAT128__ -U__SIZEOF_FLOAT128__

results in

    // .globl   _Z6kernelv
// _ZZ6kernelvE3mem has been demoted
.global .align 8 .u64 __const_$__Z6kernelv_$_ptr[1] = {generic(_ZZ6kernelvE3mem)};

.visible .entry _Z6kernelv()
{
    .local .align 8 .b8     __local_depot0[8];
    .reg .b64   %SP;
    .reg .b64   %SPL;
    .reg .b64   %rd<4>;
    // demoted variable
    .shared .align 4 .u32 _ZZ6kernelvE3mem;
    mov.u64     %SPL, __local_depot0;
    cvta.local.u64  %SP, %SPL;
    mov.u64     %rd1, __const_$__Z6kernelv_$_ptr;
    cvta.global.u64     %rd2, %rd1;
    ld.u64  %rd3, [%rd2];
    st.u64  [%SP+0], %rd3;
    ret;
}

which ptxas rejects:

ptxas /tmp/bug-26fed7.s, line 11; fatal   : Invalid initial value expression
ptxas fatal   : Ptx assembly aborted due to errors

This issue does not appear with -O1/-O2/-O3. In the unoptimized case it can be worked around by storing the pointer in a temporary first or by adding an explicit constructor to the struct.

* Clang versions tested: 11.1.0, 10.0.1
* CUDA versions tested: 11.3, 10.2
* Systems tested: x86_64 host and sm_61 / sm_75 GPUs
f7443009-18af-4d32-9ce7-e23ffe07ba3e commented 3 years ago

NVPTX backend does not support dynamic initialization.

Artem-B commented 3 years ago

.global .align 8 .u64 _const$_Z6kernelv$_ptr[1] = {generic(_ZZ6kernelvE3mem)};

I think PTXAS is unhappy about the generic(_ZZ6kernelvE3mem) here because _ZZ6kernelvE3mem no longer exist after demotion. In other words demotion shoudld not have happened if there are still references to the symbol.

llvmbot commented 3 years ago

assigned to @Artem-B