llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.34k stars 11.7k forks source link

Missing template instantiations from HIP device code #43761

Open JonChesterfield opened 4 years ago

JonChesterfield commented 4 years ago
Bugzilla Link 44416
Version trunk
OS Linux
CC @DougGregor,@zygoloid

Extended Description

A template instantiation in hip code compiled with cuda-device-only is missing from the generated bitcode.

Component is HIP, but there is no HIP on the drop down list so marking as C++.

https://godbolt.org/z/GB9MY9 clang atomic.hip --cuda-device-only -nogpulib --cuda-gpu-arch=gfx906 -emit-llvm

#define DEV __attribute__((device))

// Function emitted as expected
DEV int atomic_inc_int(int *x) {
  return __atomic_fetch_add(x, 1, __ATOMIC_SEQ_CST);
}

// Template definition
template <typename T> DEV T atomic_inc(T *x) {
  return __atomic_fetch_add(x, 1, __ATOMIC_SEQ_CST);
}

// Explict instantiation, not emitted
template DEV int atomic_inc<int>(int *);

// Use of implicit instantiation, not emitted
auto *leak_addr = &atomic_inc<int>;

Results in IR target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-ni:7" target triple = "amdgcn-amd-amdhsa" define hidden i32 @​_Z14atomic_inc_intPi(i32 nocapture %0) local_unnamed_addr #​0 { %2 = atomicrmw add i32 %0, i32 1 seq_cst ret i32 %2 }

attributes #​0 = { ... }

!llvm.module.flags = !{#0, !​1} !llvm.ident = !{#2}

!​0 = !{i32 1, !"wchar_size", i32 4} !​1 = !{i32 7, !"PIC Level", i32 1} !​2 = !{!"clang version 10.0.0 (https://github.com/llvm/llvm-project.git 6185dc0eb3ad35e1f85f2ab1038ca978563099f4)"}

Also raised on a downstream tracker before realising it reproduces on trunk (https://github.com/ROCm-Developer-Tools/aomp/issues/67)

llvmbot commented 3 months ago

@llvm/issue-subscribers-backend-amdgpu

Author: Jon Chesterfield (JonChesterfield)

| | | | --- | --- | | Bugzilla Link | [44416](https://llvm.org/bz44416) | | Version | trunk | | OS | Linux | | CC | @DougGregor,@zygoloid | ## Extended Description A template instantiation in hip code compiled with cuda-device-only is missing from the generated bitcode. Component is HIP, but there is no HIP on the drop down list so marking as C++. https://godbolt.org/z/GB9MY9 clang atomic.hip --cuda-device-only -nogpulib --cuda-gpu-arch=gfx906 -emit-llvm #define DEV __attribute__((device)) // Function emitted as expected DEV int atomic_inc_int(int *x) { return __atomic_fetch_add(x, 1, __ATOMIC_SEQ_CST); } // Template definition template <typename T> DEV T atomic_inc(T *x) { return __atomic_fetch_add(x, 1, __ATOMIC_SEQ_CST); } // Explict instantiation, not emitted template DEV int atomic_inc<int>(int *); // Use of implicit instantiation, not emitted auto *leak_addr = &atomic_inc<int>; Results in IR target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-ni:7" target triple = "amdgcn-amd-amdhsa" define hidden i32 @&#8203;_Z14atomic_inc_intPi(i32* nocapture %0) local_unnamed_addr #&#8203;0 { %2 = atomicrmw add i32* %0, i32 1 seq_cst ret i32 %2 } attributes #&#8203;0 = { ... } !llvm.module.flags = !{#0, !&#8203;1} !llvm.ident = !{#2} !&#8203;0 = !{i32 1, !"wchar_size", i32 4} !&#8203;1 = !{i32 7, !"PIC Level", i32 1} !&#8203;2 = !{!"clang version 10.0.0 (https://github.com/llvm/llvm-project.git 6185dc0eb3ad35e1f85f2ab1038ca978563099f4)"} Also raised on a downstream tracker before realising it reproduces on trunk (https://github.com/ROCm-Developer-Tools/aomp/issues/67)
yxsamliu commented 2 months ago

leak_addr was not emitted because by default it is a host variable. Add __device__ to it will cause it emitted

https://godbolt.org/z/T3nMPKEMG

The explicit instantiation has internal linkage in the default -fno-gpu-rdc mode since there is only one TU. If it is not used in the same TU, internalization will remove it. If you want to keep it so that it may be used by other TU, add -fgpu-rdc, then it has linkonce_odr linkage and will be kept.