llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.85k stars 11.91k forks source link

[OpenCL] Clang emits alloca missing `private` address space qualifier #86831

Open dnsampaio opened 7 months ago

dnsampaio commented 7 months ago

In targets that hold a different address space for each of the 7 possible opencl address spaces, when running lifetime.cl test, it breaks by performing a bitcast instead of a addresSpaceCast between two pointers of different address spaces. Although "fixing" it in CGCall.cpp to handle addresSpaceCasts is rather easy, it seems rather fragile as solution. Shouldn't there be a way to tell clang that an address space of opencl subsumes another one (such as opencl_generic subsumes .... all other opencl ?) and such casts wouldn't required, that is, even if the ptr is not at the same address space as the function argument, if the latter includes the first, it shouldn't require any sort of cast.

Generated IR after fixing it to perform address space cast:

define dso_local void @helper_no_markers() local_unnamed_addr #0 {
entry:
  %a = alloca i8, align 1
  call void @llvm.lifetime.start.p0(i64 1, ptr nonnull %a) #4
  %0 = addrspacecast ptr %a to ptr addrspace(4)         <<=== here it creates a bitcast and it breaks
  call void @use(ptr addrspace(4) noundef %0) #5
  call void @llvm.lifetime.end.p0(i64 1, ptr nonnull %a) #4
  ret void
}
llvmbot commented 7 months ago

@llvm/issue-subscribers-opencl

Author: None (dnsampaio)

In targets that hold a different address space for each of the 7 possible opencl address spaces, when running [lifetime.cl](https://github.com/llvm/llvm-project/blob/f92fa7e2cf38341211af262b21c568bef4d76b10/clang/test/CodeGenOpenCL/lifetime.cl#L4) test, it breaks by performing a bitcast instead of a addresSpaceCast between two pointers of different address spaces. Although "fixing" it in [CGCall.cpp](https://github.com/llvm/llvm-project/blob/4d177435bae03551245ffdc4dfcee5345323121d/clang/lib/CodeGen/CGCall.cpp#L5304) to handle addresSpaceCasts is rather easy, it seems rather fragile as solution. Shouldn't there be a way to tell clang that an address space of opencl subsumes another one (such as opencl_generic subsumes .... all other opencl ?) and such casts wouldn't required, that is, even if the ptr is not at the same address space as the function argument, if the latter includes the first, it shouldn't require any sort of cast. Generated IR after fixing it to perform address space cast: ``` define dso_local void @helper_no_markers() local_unnamed_addr #0 { entry: %a = alloca i8, align 1 call void @llvm.lifetime.start.p0(i64 1, ptr nonnull %a) #4 %0 = addrspacecast ptr %a to ptr addrspace(4) <<=== here it creates a bitcast and it breaks call void @use(ptr addrspace(4) noundef %0) #5 call void @llvm.lifetime.end.p0(i64 1, ptr nonnull %a) #4 ret void } ```
efriedma-quic commented 7 months ago

Address spaces are explicitly represented in the AST; if address-space of an argument is different from the address-space of the parameter, there should be an ImplicitCastExpr. If you reach that code in CGCall and the address-spaces don't match, something went wrong earlier.

dnsampaio commented 7 months ago

Indeed it seems that the bug comes from the decisions for explicit address space casts in opencl. It asserts false here for this other input opencl code:

#define PRIV_TYPE char2
#define PRIV_SIZE 128
__kernel void test_fn( __global char2 *src, __global uint *offsets, __global uint *alignmentOffsets, __global char2 *results )
{
    __private PRIV_TYPE sPrivateStorage[ PRIV_SIZE ];
    int tid = get_global_id( 0 );

    for( int i = 0; i < 128; i++ )
      sPrivateStorage[ i ] = src[ i ];

    char2 tmp = vload2( offsets[ tid ], ( (__private char *) sPrivateStorage ) + alignmentOffsets[ tid ] );
   results[ tid ] = tmp;
}

It bitcasts a pointer of address space 0 to one of address space 4.

The target address space map is:

  const LangASMap KVX_AS_Map = {
      (unsigned)LangAS::Default,              // Default
      (unsigned)LangAS::opencl_global,        // opencl_global
      (unsigned)LangAS::opencl_local,         // opencl_local
      (unsigned)LangAS::opencl_constant,      // opencl_constant
      (unsigned)LangAS::opencl_private,       // opencl_private
      (unsigned)LangAS::opencl_generic,       // opencl_generic
      (unsigned)LangAS::opencl_global_device, // opencl_global_device
      (unsigned)LangAS::opencl_global_host,   // opencl_global_host
      (unsigned)LangAS::opencl_global_device, // cuda_device
      (unsigned)LangAS::opencl_constant,      // cuda_constant
      (unsigned)LangAS::opencl_local,         // cuda_shared
      (unsigned)LangAS::opencl_global,        // sycl_global
      (unsigned)LangAS::opencl_global_device, // sycl_global_device
      (unsigned)LangAS::opencl_global_host,   // sycl_global_host
      (unsigned)LangAS::opencl_local,         // sycl_local
      (unsigned)LangAS::opencl_private,       // sycl_private
      (unsigned)LangAS::Default,              // ptr32_sptr
      (unsigned)LangAS::Default,              // ptr32_uptr
      (unsigned)LangAS::Default,              // ptr64
      (unsigned)LangAS::Default,              // hlsl_groupshared
      256                                     // Or target starts from 256
  };

The call trace is:

2.      .cache/pocl/kcache/tempfile_qDWENf.cl:3:15: LLVM IR generation of declaration 'test_fn'
3.      .cache/pocl/kcache/tempfile_qDWENf.cl:3:15: Generating code for declaration 'test_fn'
llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) llvm-project/llvm/lib/Support/Unix/Signals.inc:723:22
PrintStackTraceSignalHandler(void*) llvm-project/llvm/lib/Support/Unix/Signals.inc:798:1
llvm::sys::RunSignalHandlers() llvm-project/llvm/lib/Support/Signals.cpp:105:20
llvm::sys::CleanupOnSignal(unsigned long) llvm-project/llvm/lib/Support/Unix/Signals.inc:367:31
(anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:73:5
CrashRecoverySignalHandler(int) llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:391:1
(/lib/x86_64-linux-gnu/libc.so.6+0x42520)
__pthread_kill_implementation ./nptl/pthread_kill.c:44:76
__pthread_kill_internal ./nptl/pthread_kill.c:78:10
pthread_kill ./nptl/pthread_kill.c:89:10
gsignal ./signal/../sysdeps/posix/raise.c:27:6
abort ./stdlib/abort.c:81:7
_nl_load_domain ./intl/loadmsgcat.c:1177:9
(/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
(anonymous namespace)::ScalarExprEmitter::VisitCastExpr(clang::CastExpr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:2107:9
(anonymous namespace)::ScalarExprEmitter::VisitExplicitCastExpr(clang::ExplicitCastExpr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:578:3
clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::VisitCStyleCastExpr(clang::CStyleCastExpr*) build_llvm_Debug/tools/clang/include/clang/AST/StmtNodes.inc:582:1
clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit(clang::Stmt*) build_llvm_Debug/tools/clang/include/clang/AST/StmtNodes.inc:582:1
(anonymous namespace)::ScalarExprEmitter::Visit(clang::Expr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:412:57
(anonymous namespace)::ScalarExprEmitter::VisitParenExpr(clang::ParenExpr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:438:3
clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit(clang::Stmt*) build_llvm_Debug/tools/clang/include/clang/AST/StmtNodes.inc:152:1
(anonymous namespace)::ScalarExprEmitter::Visit(clang::Expr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:412:57
clang::CodeGen::CodeGenFunction::EmitPromotedScalarExpr(clang::Expr const*, clang::QualType) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:5295:64
(anonymous namespace)::ScalarExprEmitter::EmitBinOps(clang::BinaryOperator const*, clang::QualType) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:3297:14
(anonymous namespace)::ScalarExprEmitter::VisitBinAdd(clang::BinaryOperator const*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:857:3
clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit(clang::Stmt*) llvm-project/clang/include/clang/AST/StmtVisitor.h:55:26
(anonymous namespace)::ScalarExprEmitter::Visit(clang::Expr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:412:57
(anonymous namespace)::ScalarExprEmitter::VisitCastExpr(clang::CastExpr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:2242:54
clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::VisitImplicitCastExpr(clang::ImplicitCastExpr*) build_llvm_Debug/tools/clang/include/clang/AST/StmtNodes.inc:522:1
clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit(clang::Stmt*) build_llvm_Debug/tools/clang/include/clang/AST/StmtNodes.inc:522:1
(anonymous namespace)::ScalarExprEmitter::Visit(clang::Expr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:412:57
clang::CodeGen::CodeGenFunction::EmitScalarExpr(clang::Expr const*, bool) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:5264:1
clang::CodeGen::CodeGenFunction::EmitAnyExpr(clang::Expr const*, clang::CodeGen::AggValueSlot, bool) llvm-project/clang/lib/CodeGen/CGExpr.cpp:216:23
clang::CodeGen::CodeGenFunction::EmitAnyExprToTemp(clang::Expr const*) llvm-project/clang/lib/CodeGen/CGExpr.cpp:235:21
clang::CodeGen::CodeGenFunction::EmitCallArg(clang::CodeGen::CallArgList&, clang::Expr const*, clang::QualType) llvm-project/clang/lib/CodeGen/CGCall.cpp:4712:11
clang::CodeGen::CodeGenFunction::EmitCallArgs(clang::CodeGen::CallArgList&, clang::CodeGen::CodeGenFunction::PrototypeWrapper, llvm::iterator_range<clang::Stmt::CastIterator<clang::Expr, clang::Expr const* const, clang::Stmt const* const>>, clang::CodeGen::CodeGenFunction::AbstractCallee, unsigned int, clang::CodeGen::CodeGenFunction::EvaluationOrder) llvm-project/clang/lib/CodeGen/CGCall.cpp:4562:5
clang::CodeGen::CodeGenFunction::EmitCall(clang::QualType, clang::CodeGen::CGCallee const&, clang::CallExpr const*, clang::CodeGen::ReturnValueSlot, llvm::Value*) llvm-project/clang/lib/CodeGen/CGExpr.cpp:5865:15
clang::CodeGen::CodeGenFunction::EmitCallExpr(clang::CallExpr const*, clang::CodeGen::ReturnValueSlot) llvm-project/clang/lib/CodeGen/CGExpr.cpp:5398:18
(anonymous namespace)::ScalarExprEmitter::VisitCallExpr(clang::CallExpr const*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:585:32
clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit(clang::Stmt*) build_llvm_Debug/tools/clang/include/clang/AST/StmtNodes.inc:602:1
(anonymous namespace)::ScalarExprEmitter::Visit(clang::Expr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:412:57
clang::CodeGen::CodeGenFunction::EmitScalarExpr(clang::Expr const*, bool) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:5264:1
clang::CodeGen::CodeGenFunction::EmitScalarInit(clang::Expr const*, clang::ValueDecl const*, clang::CodeGen::LValue, bool) llvm-project/clang/lib/CodeGen/CGDecl.cpp:796:40
clang::CodeGen::CodeGenFunction::EmitExprAsInit(clang::Expr const*, clang::ValueDecl const*, clang::CodeGen::LValue, bool) llvm-project/clang/lib/CodeGen/CGDecl.cpp:2012:19
clang::CodeGen::CodeGenFunction::EmitAutoVarInit(clang::CodeGen::CodeGenFunction::AutoVarEmission const&) llvm-project/clang/lib/CodeGen/CGDecl.cpp:1974:26
clang::CodeGen::CodeGenFunction::EmitAutoVarDecl(clang::VarDecl const&) llvm-project/clang/lib/CodeGen/CGDecl.cpp:1350:22
clang::CodeGen::CodeGenFunction::EmitVarDecl(clang::VarDecl const&) llvm-project/clang/lib/CodeGen/CGDecl.cpp:217:27
clang::CodeGen::CodeGenFunction::EmitDecl(clang::Decl const&) llvm-project/clang/lib/CodeGen/CGDecl.cpp:163:47
clang::CodeGen::CodeGenFunction::EmitDeclStmt(clang::DeclStmt const&) llvm-project/clang/lib/CodeGen/CGStmt.cpp:1482:3
clang::CodeGen::CodeGenFunction::EmitSimpleStmt(clang::Stmt const*, llvm::ArrayRef<clang::Attr const*>) llvm-project/clang/lib/CodeGen/CGStmt.cpp:460:5
clang::CodeGen::CodeGenFunction::EmitStmt(clang::Stmt const*, llvm::ArrayRef<clang::Attr const*>) llvm-project/clang/lib/CodeGen/CGStmt.cpp:65:3
clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(clang::CompoundStmt const&, bool, clang::CodeGen::AggValueSlot) llvm-project/clang/lib/CodeGen/CGStmt.cpp:514:3
clang::CodeGen::CodeGenFunction::EmitFunctionBody(clang::Stmt const*) llvm-project/clang/lib/CodeGen/CodeGenFunction.cpp:1266:33
clang::CodeGen::CodeGenFunction::GenerateCode(clang::GlobalDecl, llvm::Function*, clang::CodeGen::CGFunctionInfo const&) llvm-project/clang/lib/CodeGen/CodeGenFunction.cpp:1490:21
clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue*) llvm-project/clang/lib/CodeGen/CodeGenModule.cpp:5841:3
clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue*) llvm-project/clang/lib/CodeGen/CodeGenModule.cpp:4056:47
clang::CodeGen::CodeGenModule::EmitGlobal(clang::GlobalDecl) llvm-project/clang/lib/CodeGen/CodeGenModule.cpp:3781:27
clang::CodeGen::CodeGenModule::EmitTopLevelDecl(clang::Decl*) llvm-project/clang/lib/CodeGen/CodeGenModule.cpp:6736:37
(anonymous namespace)::CodeGeneratorImpl::HandleTopLevelDecl(clang::DeclGroupRef) llvm-project/clang/lib/CodeGen/ModuleBuilder.cpp:189:7
clang::BackendConsumer::HandleTopLevelDecl(clang::DeclGroupRef) llvm-project/clang/lib/CodeGen/CodeGenAction.cpp:204:7
clang::ParseAST(clang::Sema&, bool, bool) llvm-project/clang/lib/Parse/ParseAST.cpp:167:20
clang::ASTFrontendAction::ExecuteAction() llvm-project/clang/lib/Frontend/FrontendAction.cpp:1183:11
clang::CodeGenAction::ExecuteAction() llvm-project/clang/lib/CodeGen/CodeGenAction.cpp:1164:5
clang::FrontendAction::Execute() llvm-project/clang/lib/Frontend/FrontendAction.cpp:1073:38
clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) llvm-project/clang/lib/Frontend/CompilerInstance.cpp:1062:42
clang::ExecuteCompilerInvocation(clang::CompilerInstance*) llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:274:38
cc1_main(llvm::ArrayRef<char const*>, char const*, void*) llvm-project/clang/tools/driver/cc1_main.cpp:232:40
ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) llvm-project/clang/tools/driver/driver.cpp:215:20
clang_main(int, char**, llvm::ToolContext const&)::'lambda'(llvm::SmallVectorImpl<char const*>&)::operator()(llvm::SmallVectorImpl<char const*>&) const llvm-project/clang/tools/driver/driver.cpp:356:5
int llvm::function_ref<int (llvm::SmallVectorImpl<char const*>&)>::callback_fn<clang_main(int, char**, llvm::ToolContext const&)::'lambda'(llvm::SmallVectorImpl<char const*>&)>(long, llvm::SmallVectorImpl<char const*>&) llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:47:3
llvm::function_ref<int (llvm::SmallVectorImpl<char const*>&)>::operator()(llvm::SmallVectorImpl<char const*>&) const llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69:3
clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()::operator()() const llvm-project/clang/lib/Driver/Job.cpp:440:32
void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()>(long) llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46:40
llvm::function_ref<void ()>::operator()() const llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:62
llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:427:10
clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const llvm-project/clang/lib/Driver/Job.cpp:440:7
clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const llvm-project/clang/lib/Driver/Compilation.cpp:199:22
clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const llvm-project/clang/lib/Driver/Compilation.cpp:253:62
clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) llvm-project/clang/lib/Driver/Driver.cpp:1951:28
clang_main(int, char**, llvm::ToolContext const&) llvm-project/clang/tools/driver/driver.cpp:391:39
main build_llvm_Debug/tools/clang/tools/driver/clang-driver.cpp:17:20
__libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
call_init ./csu/../csu/libc-start.c:128:20
__libc_start_main ./csu/../csu/libc-start.c:379:5
_start (bin/clang+0xe5c5)
dnsampaio commented 7 months ago

The address space is not preserved when visiting the ArrayToPointerDecay, the source expression dumps as:

ImplicitCastExpr 0x55de70204238 '__private char2 *' <ArrayToPointerDecay> part_of_explicit_cast
`-DeclRefExpr 0x55de702041c8 '__private char2[128]' lvalue Var 0x55de701f9d10 'sPrivateStorage' '__private char2[128]'

where the sorce type dumps as:

ptr

instead of

ptr addrspace(4)
dnsampaio commented 7 months ago

Further investigating, the address space qualifier is not present in the array allocation it self:

ImplicitCastExpr 0x55e796b3c238 '__private char2 *' <ArrayToPointerDecay> part_of_explicit_cast
`-DeclRefExpr 0x55e796b3c1c8 '__private char2[128]' lvalue Var 0x55e796b31d10 'sPrivateStorage' '__private char2[128]'
--->
  %arraydecay = getelementptr inbounds [128 x <2 x i8>], ptr %sPrivateStorage, i64 0, i64 0
--->
ptr
---
  %sPrivateStorage = alloca [128 x <2 x i8>], align 2   <<<======= should have addrspace(4) here
dnsampaio commented 7 months ago

It seems both reproducers stem from an alloca of a private value missing the private addresspace qualifier. In the first reproducer:

define dso_local void @helper_no_markers() local_unnamed_addr #0 {
entry:
  %a = alloca i8, align 1           <<<< missing address space(4).

From the opencl manual:

private - private Address Space Qualifiers
Description

The private address space is a memory segment that can only be accessed by one work item. Variables that are
not shareable among work items are allocated in the private address space, and it is the default address space
for most variables, in particular variables with automatic storage duration.

Which means that in the first reproducer, the alloca of char a should be private, and in address space 4.

dnsampaio commented 7 months ago

Seems related to #42641

efriedma-quic commented 7 months ago

Is your target's datalayout is correct? There's a field specifying the addressspace of allocas. (As an example, on AMDGPU targets, allocas are in addrspace 5.)

dnsampaio commented 7 months ago

Is your target's datalayout is correct? There's a field specifying the addressspace of allocas. (As an example, on AMDGPU targets, allocas are in addrspace 5.)

Ok, I believe the target layout doesn't have it I'll check it.

dnsampaio commented 6 months ago

Hum, indeed I can set alloca AS in the target data layout string, but doing so breaks much more cases than it fixes it. I'll have to play with it further on, I imagine that, just as AMD, I'll need two tables, one for when it is opencl and one that is not.