Open dnsampaio opened 7 months ago
@llvm/issue-subscribers-opencl
Author: None (dnsampaio)
Address spaces are explicitly represented in the AST; if address-space of an argument is different from the address-space of the parameter, there should be an ImplicitCastExpr. If you reach that code in CGCall and the address-spaces don't match, something went wrong earlier.
Indeed it seems that the bug comes from the decisions for explicit address space casts in opencl. It asserts false here for this other input opencl code:
#define PRIV_TYPE char2
#define PRIV_SIZE 128
__kernel void test_fn( __global char2 *src, __global uint *offsets, __global uint *alignmentOffsets, __global char2 *results )
{
__private PRIV_TYPE sPrivateStorage[ PRIV_SIZE ];
int tid = get_global_id( 0 );
for( int i = 0; i < 128; i++ )
sPrivateStorage[ i ] = src[ i ];
char2 tmp = vload2( offsets[ tid ], ( (__private char *) sPrivateStorage ) + alignmentOffsets[ tid ] );
results[ tid ] = tmp;
}
It bitcasts a pointer of address space 0 to one of address space 4.
The target address space map is:
const LangASMap KVX_AS_Map = {
(unsigned)LangAS::Default, // Default
(unsigned)LangAS::opencl_global, // opencl_global
(unsigned)LangAS::opencl_local, // opencl_local
(unsigned)LangAS::opencl_constant, // opencl_constant
(unsigned)LangAS::opencl_private, // opencl_private
(unsigned)LangAS::opencl_generic, // opencl_generic
(unsigned)LangAS::opencl_global_device, // opencl_global_device
(unsigned)LangAS::opencl_global_host, // opencl_global_host
(unsigned)LangAS::opencl_global_device, // cuda_device
(unsigned)LangAS::opencl_constant, // cuda_constant
(unsigned)LangAS::opencl_local, // cuda_shared
(unsigned)LangAS::opencl_global, // sycl_global
(unsigned)LangAS::opencl_global_device, // sycl_global_device
(unsigned)LangAS::opencl_global_host, // sycl_global_host
(unsigned)LangAS::opencl_local, // sycl_local
(unsigned)LangAS::opencl_private, // sycl_private
(unsigned)LangAS::Default, // ptr32_sptr
(unsigned)LangAS::Default, // ptr32_uptr
(unsigned)LangAS::Default, // ptr64
(unsigned)LangAS::Default, // hlsl_groupshared
256 // Or target starts from 256
};
The call trace is:
2. .cache/pocl/kcache/tempfile_qDWENf.cl:3:15: LLVM IR generation of declaration 'test_fn'
3. .cache/pocl/kcache/tempfile_qDWENf.cl:3:15: Generating code for declaration 'test_fn'
llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) llvm-project/llvm/lib/Support/Unix/Signals.inc:723:22
PrintStackTraceSignalHandler(void*) llvm-project/llvm/lib/Support/Unix/Signals.inc:798:1
llvm::sys::RunSignalHandlers() llvm-project/llvm/lib/Support/Signals.cpp:105:20
llvm::sys::CleanupOnSignal(unsigned long) llvm-project/llvm/lib/Support/Unix/Signals.inc:367:31
(anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:73:5
CrashRecoverySignalHandler(int) llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:391:1
(/lib/x86_64-linux-gnu/libc.so.6+0x42520)
__pthread_kill_implementation ./nptl/pthread_kill.c:44:76
__pthread_kill_internal ./nptl/pthread_kill.c:78:10
pthread_kill ./nptl/pthread_kill.c:89:10
gsignal ./signal/../sysdeps/posix/raise.c:27:6
abort ./stdlib/abort.c:81:7
_nl_load_domain ./intl/loadmsgcat.c:1177:9
(/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
(anonymous namespace)::ScalarExprEmitter::VisitCastExpr(clang::CastExpr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:2107:9
(anonymous namespace)::ScalarExprEmitter::VisitExplicitCastExpr(clang::ExplicitCastExpr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:578:3
clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::VisitCStyleCastExpr(clang::CStyleCastExpr*) build_llvm_Debug/tools/clang/include/clang/AST/StmtNodes.inc:582:1
clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit(clang::Stmt*) build_llvm_Debug/tools/clang/include/clang/AST/StmtNodes.inc:582:1
(anonymous namespace)::ScalarExprEmitter::Visit(clang::Expr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:412:57
(anonymous namespace)::ScalarExprEmitter::VisitParenExpr(clang::ParenExpr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:438:3
clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit(clang::Stmt*) build_llvm_Debug/tools/clang/include/clang/AST/StmtNodes.inc:152:1
(anonymous namespace)::ScalarExprEmitter::Visit(clang::Expr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:412:57
clang::CodeGen::CodeGenFunction::EmitPromotedScalarExpr(clang::Expr const*, clang::QualType) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:5295:64
(anonymous namespace)::ScalarExprEmitter::EmitBinOps(clang::BinaryOperator const*, clang::QualType) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:3297:14
(anonymous namespace)::ScalarExprEmitter::VisitBinAdd(clang::BinaryOperator const*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:857:3
clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit(clang::Stmt*) llvm-project/clang/include/clang/AST/StmtVisitor.h:55:26
(anonymous namespace)::ScalarExprEmitter::Visit(clang::Expr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:412:57
(anonymous namespace)::ScalarExprEmitter::VisitCastExpr(clang::CastExpr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:2242:54
clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::VisitImplicitCastExpr(clang::ImplicitCastExpr*) build_llvm_Debug/tools/clang/include/clang/AST/StmtNodes.inc:522:1
clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit(clang::Stmt*) build_llvm_Debug/tools/clang/include/clang/AST/StmtNodes.inc:522:1
(anonymous namespace)::ScalarExprEmitter::Visit(clang::Expr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:412:57
clang::CodeGen::CodeGenFunction::EmitScalarExpr(clang::Expr const*, bool) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:5264:1
clang::CodeGen::CodeGenFunction::EmitAnyExpr(clang::Expr const*, clang::CodeGen::AggValueSlot, bool) llvm-project/clang/lib/CodeGen/CGExpr.cpp:216:23
clang::CodeGen::CodeGenFunction::EmitAnyExprToTemp(clang::Expr const*) llvm-project/clang/lib/CodeGen/CGExpr.cpp:235:21
clang::CodeGen::CodeGenFunction::EmitCallArg(clang::CodeGen::CallArgList&, clang::Expr const*, clang::QualType) llvm-project/clang/lib/CodeGen/CGCall.cpp:4712:11
clang::CodeGen::CodeGenFunction::EmitCallArgs(clang::CodeGen::CallArgList&, clang::CodeGen::CodeGenFunction::PrototypeWrapper, llvm::iterator_range<clang::Stmt::CastIterator<clang::Expr, clang::Expr const* const, clang::Stmt const* const>>, clang::CodeGen::CodeGenFunction::AbstractCallee, unsigned int, clang::CodeGen::CodeGenFunction::EvaluationOrder) llvm-project/clang/lib/CodeGen/CGCall.cpp:4562:5
clang::CodeGen::CodeGenFunction::EmitCall(clang::QualType, clang::CodeGen::CGCallee const&, clang::CallExpr const*, clang::CodeGen::ReturnValueSlot, llvm::Value*) llvm-project/clang/lib/CodeGen/CGExpr.cpp:5865:15
clang::CodeGen::CodeGenFunction::EmitCallExpr(clang::CallExpr const*, clang::CodeGen::ReturnValueSlot) llvm-project/clang/lib/CodeGen/CGExpr.cpp:5398:18
(anonymous namespace)::ScalarExprEmitter::VisitCallExpr(clang::CallExpr const*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:585:32
clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit(clang::Stmt*) build_llvm_Debug/tools/clang/include/clang/AST/StmtNodes.inc:602:1
(anonymous namespace)::ScalarExprEmitter::Visit(clang::Expr*) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:412:57
clang::CodeGen::CodeGenFunction::EmitScalarExpr(clang::Expr const*, bool) llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:5264:1
clang::CodeGen::CodeGenFunction::EmitScalarInit(clang::Expr const*, clang::ValueDecl const*, clang::CodeGen::LValue, bool) llvm-project/clang/lib/CodeGen/CGDecl.cpp:796:40
clang::CodeGen::CodeGenFunction::EmitExprAsInit(clang::Expr const*, clang::ValueDecl const*, clang::CodeGen::LValue, bool) llvm-project/clang/lib/CodeGen/CGDecl.cpp:2012:19
clang::CodeGen::CodeGenFunction::EmitAutoVarInit(clang::CodeGen::CodeGenFunction::AutoVarEmission const&) llvm-project/clang/lib/CodeGen/CGDecl.cpp:1974:26
clang::CodeGen::CodeGenFunction::EmitAutoVarDecl(clang::VarDecl const&) llvm-project/clang/lib/CodeGen/CGDecl.cpp:1350:22
clang::CodeGen::CodeGenFunction::EmitVarDecl(clang::VarDecl const&) llvm-project/clang/lib/CodeGen/CGDecl.cpp:217:27
clang::CodeGen::CodeGenFunction::EmitDecl(clang::Decl const&) llvm-project/clang/lib/CodeGen/CGDecl.cpp:163:47
clang::CodeGen::CodeGenFunction::EmitDeclStmt(clang::DeclStmt const&) llvm-project/clang/lib/CodeGen/CGStmt.cpp:1482:3
clang::CodeGen::CodeGenFunction::EmitSimpleStmt(clang::Stmt const*, llvm::ArrayRef<clang::Attr const*>) llvm-project/clang/lib/CodeGen/CGStmt.cpp:460:5
clang::CodeGen::CodeGenFunction::EmitStmt(clang::Stmt const*, llvm::ArrayRef<clang::Attr const*>) llvm-project/clang/lib/CodeGen/CGStmt.cpp:65:3
clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(clang::CompoundStmt const&, bool, clang::CodeGen::AggValueSlot) llvm-project/clang/lib/CodeGen/CGStmt.cpp:514:3
clang::CodeGen::CodeGenFunction::EmitFunctionBody(clang::Stmt const*) llvm-project/clang/lib/CodeGen/CodeGenFunction.cpp:1266:33
clang::CodeGen::CodeGenFunction::GenerateCode(clang::GlobalDecl, llvm::Function*, clang::CodeGen::CGFunctionInfo const&) llvm-project/clang/lib/CodeGen/CodeGenFunction.cpp:1490:21
clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue*) llvm-project/clang/lib/CodeGen/CodeGenModule.cpp:5841:3
clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue*) llvm-project/clang/lib/CodeGen/CodeGenModule.cpp:4056:47
clang::CodeGen::CodeGenModule::EmitGlobal(clang::GlobalDecl) llvm-project/clang/lib/CodeGen/CodeGenModule.cpp:3781:27
clang::CodeGen::CodeGenModule::EmitTopLevelDecl(clang::Decl*) llvm-project/clang/lib/CodeGen/CodeGenModule.cpp:6736:37
(anonymous namespace)::CodeGeneratorImpl::HandleTopLevelDecl(clang::DeclGroupRef) llvm-project/clang/lib/CodeGen/ModuleBuilder.cpp:189:7
clang::BackendConsumer::HandleTopLevelDecl(clang::DeclGroupRef) llvm-project/clang/lib/CodeGen/CodeGenAction.cpp:204:7
clang::ParseAST(clang::Sema&, bool, bool) llvm-project/clang/lib/Parse/ParseAST.cpp:167:20
clang::ASTFrontendAction::ExecuteAction() llvm-project/clang/lib/Frontend/FrontendAction.cpp:1183:11
clang::CodeGenAction::ExecuteAction() llvm-project/clang/lib/CodeGen/CodeGenAction.cpp:1164:5
clang::FrontendAction::Execute() llvm-project/clang/lib/Frontend/FrontendAction.cpp:1073:38
clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) llvm-project/clang/lib/Frontend/CompilerInstance.cpp:1062:42
clang::ExecuteCompilerInvocation(clang::CompilerInstance*) llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:274:38
cc1_main(llvm::ArrayRef<char const*>, char const*, void*) llvm-project/clang/tools/driver/cc1_main.cpp:232:40
ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) llvm-project/clang/tools/driver/driver.cpp:215:20
clang_main(int, char**, llvm::ToolContext const&)::'lambda'(llvm::SmallVectorImpl<char const*>&)::operator()(llvm::SmallVectorImpl<char const*>&) const llvm-project/clang/tools/driver/driver.cpp:356:5
int llvm::function_ref<int (llvm::SmallVectorImpl<char const*>&)>::callback_fn<clang_main(int, char**, llvm::ToolContext const&)::'lambda'(llvm::SmallVectorImpl<char const*>&)>(long, llvm::SmallVectorImpl<char const*>&) llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:47:3
llvm::function_ref<int (llvm::SmallVectorImpl<char const*>&)>::operator()(llvm::SmallVectorImpl<char const*>&) const llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69:3
clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()::operator()() const llvm-project/clang/lib/Driver/Job.cpp:440:32
void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()>(long) llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46:40
llvm::function_ref<void ()>::operator()() const llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:62
llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:427:10
clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const llvm-project/clang/lib/Driver/Job.cpp:440:7
clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const llvm-project/clang/lib/Driver/Compilation.cpp:199:22
clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const llvm-project/clang/lib/Driver/Compilation.cpp:253:62
clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) llvm-project/clang/lib/Driver/Driver.cpp:1951:28
clang_main(int, char**, llvm::ToolContext const&) llvm-project/clang/tools/driver/driver.cpp:391:39
main build_llvm_Debug/tools/clang/tools/driver/clang-driver.cpp:17:20
__libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
call_init ./csu/../csu/libc-start.c:128:20
__libc_start_main ./csu/../csu/libc-start.c:379:5
_start (bin/clang+0xe5c5)
The address space is not preserved when visiting the ArrayToPointerDecay, the source expression dumps as:
ImplicitCastExpr 0x55de70204238 '__private char2 *' <ArrayToPointerDecay> part_of_explicit_cast
`-DeclRefExpr 0x55de702041c8 '__private char2[128]' lvalue Var 0x55de701f9d10 'sPrivateStorage' '__private char2[128]'
where the sorce type dumps as:
ptr
instead of
ptr addrspace(4)
Further investigating, the address space qualifier is not present in the array allocation it self:
ImplicitCastExpr 0x55e796b3c238 '__private char2 *' <ArrayToPointerDecay> part_of_explicit_cast
`-DeclRefExpr 0x55e796b3c1c8 '__private char2[128]' lvalue Var 0x55e796b31d10 'sPrivateStorage' '__private char2[128]'
--->
%arraydecay = getelementptr inbounds [128 x <2 x i8>], ptr %sPrivateStorage, i64 0, i64 0
--->
ptr
---
%sPrivateStorage = alloca [128 x <2 x i8>], align 2 <<<======= should have addrspace(4) here
It seems both reproducers stem from an alloca of a private value missing the private addresspace qualifier. In the first reproducer:
define dso_local void @helper_no_markers() local_unnamed_addr #0 {
entry:
%a = alloca i8, align 1 <<<< missing address space(4).
From the opencl manual:
private - private Address Space Qualifiers
Description
The private address space is a memory segment that can only be accessed by one work item. Variables that are
not shareable among work items are allocated in the private address space, and it is the default address space
for most variables, in particular variables with automatic storage duration.
Which means that in the first reproducer, the alloca of char a
should be private, and in address space 4.
Seems related to #42641
Is your target's datalayout is correct? There's a field specifying the addressspace of allocas. (As an example, on AMDGPU targets, allocas are in addrspace 5.)
Is your target's datalayout is correct? There's a field specifying the addressspace of allocas. (As an example, on AMDGPU targets, allocas are in addrspace 5.)
Ok, I believe the target layout doesn't have it I'll check it.
Hum, indeed I can set alloca AS in the target data layout string, but doing so breaks much more cases than it fixes it. I'll have to play with it further on, I imagine that, just as AMD, I'll need two tables, one for when it is opencl and one that is not.
In targets that hold a different address space for each of the 7 possible opencl address spaces, when running lifetime.cl test, it breaks by performing a bitcast instead of a addresSpaceCast between two pointers of different address spaces. Although "fixing" it in CGCall.cpp to handle addresSpaceCasts is rather easy, it seems rather fragile as solution. Shouldn't there be a way to tell clang that an address space of opencl subsumes another one (such as opencl_generic subsumes .... all other opencl ?) and such casts wouldn't required, that is, even if the ptr is not at the same address space as the function argument, if the latter includes the first, it shouldn't require any sort of cast.
Generated IR after fixing it to perform address space cast: