intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.23k stars 736 forks source link

[SYCL-MLIR][RODINIA]: Rodinia performance benchmarks for SYCL #7641

Open etiotto opened 1 year ago

etiotto commented 1 year ago

The Rodinia benchmarks have been ported to SYCL and are available publicly at: https://github.com/zjin-lcf/Rodinia_SYCL. I have forked the repository and modified the Makefiles to be able to compile the benchmarks with the Intel clang++ SYCL compiler and with the SYCL-MLIR compiler. The fork with the required changes (use branch sycl_mlir) is at: https://github.com/etiotto/Rodinia_SYCL/tree/sycl_mlir.

We can initially focus on 5 bmks: backprop, gaussian, particlefilter, streamcluster and lud. These 5 bmks compile and run cleanly using the clang++ SYCL Intel compiler, and gaussian also compiles and runs when the SYCL-MLIR/cgeist compiler is used. The remaining 4 bmks fail to compile, with the same symptom:

/iusers/etiotto/intel-llvm/build/bin/clang++ -w -fsycl -fsycl-targets=spir64-unknown-unknown-syclmlir  -std=c++17 -Wall -DBLOCK_SIZE=256  -O3 ex_particle_SYCL_single_seq.cpp -c
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /localdisk2/etiotto/intel-llvm/build/bin/cgeist -emit-llvm -w ex_particle_SYCL_single_seq.cpp -o /tmp/ex_particle_SYCL_single_seq-d3a65b.bc --args -cc1 -triple spir64-unknown-unknown-syclmlir -aux-triple x86_64-unknown-linux-gnu -fsycl-is-device -fdeclare-spirv-builtins -mllvm -sycl-opt -fenable-sycl-dae -fsycl-instrument-device-code -Wno-sycl-strict -fsycl-int-header=/tmp/ex_particle_SYCL_single_seq-header-af3a7d.h -fsycl-int-footer=/tmp/ex_particle_SYCL_single_seq-footer-6eb86f.h -sycl-std=2020 -fsycl-unique-prefix=9204d406c7a65583 -Wspir-compat -emit-llvm-bc -emit-llvm-uselists -disable-free -clear-ast-before-backend -main-file-name ex_particle_SYCL_single_seq.cpp -fsycl-use-main-file-name -full-main-file-name ex_particle_SYCL_single_seq.cpp -mrelocation-model static -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -fno-verbose-asm -mconstructor-aliases -aux-target-cpu x86-64 -mllvm -treat-scalable-fixed-error-as-warning -debugger-tuning=gdb -resource-dir /localdisk2/etiotto/intel-llvm/build/lib/clang/16 -internal-isystem /iusers/etiotto/intel-llvm/build/bin/../include/sycl -internal-isystem /iusers/etiotto/intel-llvm/build/bin/../include -D BLOCK_SIZE=256 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /localdisk2/etiotto/intel-llvm/build/lib/clang/16/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /localdisk2/etiotto/intel-llvm/build/lib/clang/16/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -Wall -w -std=c++17 -fdeprecated-macro -fdebug-compilation-dir=/nfs/site/home/etiotto/projects/Rodinia_SYCL/sycl/particlefilter -ferror-limit 19 -fgnuc-version=4.2.1 -no-opaque-pointers -fcxx-exceptions -fexceptions -fcolor-diagnostics -vectorize-loops -vectorize-slp -faddrsig -D __GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/ex_particle_SYCL_single_seq-d3a65b.bc -x c++ ex_particle_SYCL_single_seq.cpp
1.      <eof> parser at end of file
 #0 0x00007fa77cd73f53 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/localdisk2/etiotto/intel-llvm/build/bin/../lib/libLLVMSupport.so.16git+0x1d4f53)
 #1 0x00007fa77cd721c0 llvm::sys::RunSignalHandlers() (/localdisk2/etiotto/intel-llvm/build/bin/../lib/libLLVMSupport.so.16git+0x1d31c0)
 #2 0x00007fa77cd7443f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fa793942b20 __restore_rt sigaction.c:0:0
 #4 0x00007fa785e75233 mlir::MemRefType::getElementType() const (/localdisk2/etiotto/intel-llvm/build/bin/../lib/libMLIRIR.so.16git+0xc9233)
 #5 0x00000000004457d2 isSYCLInheritType(mlir::Type&, mlir::Value&) driver.cc:0:0
 #6 0x00000000004332f1 MLIRScanner::GetAddressOfBaseClass(mlir::Value, clang::CXXRecordDecl const*, llvm::ArrayRef<clang::Type const*>, llvm::ArrayRef<bool>) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x4332f1)
 #7 0x0000000000430465 MLIRScanner::init(mlir::FunctionOpInterface, FunctionToEmit const&) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x430465)
 #8 0x000000000044a1b5 MLIRASTConsumer::run() (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x44a1b5)
 #9 0x00007fa776ae1da0 clang::ParseAST(clang::Sema&, bool, bool) (/localdisk2/etiotto/intel-llvm/build/bin/../lib/../lib/libclangParse.so.16git+0x38da0)
#10 0x00007fa782d53221 clang::FrontendAction::Execute() (/localdisk2/etiotto/intel-llvm/build/bin/../lib/libclangFrontend.so.16git+0x12c221)
#11 0x00000000004549a8 processInputFiles(llvm::cl::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, bool, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>> const&, llvm::cl::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, bool, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>> const&, mlir::MLIRContext&, mlir::OwningOpRef<mlir::ModuleOp>&, llvm::DataLayout&, llvm::Triple&, char const*, bool) driver.cc:0:0
#12 0x000000000044fb2e main (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x44fb2e)
#13 0x00007fa77bece493 __libc_start_main (/lib64/libc.so.6+0x23493)
#14 0x000000000042d32e _start (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x42d32e)
clang-16: error: unable to execute command: Segmentation fault (core dumped)
clang-16: error: cgeist command failed due to signal (use -v to see invocation)
clang version 16.0.0 (https://github.com/etiotto/intel-llvm.git 1bc16875da18bf80d183cb59744e95f9273faa51)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /iusers/etiotto/intel-llvm/build/bin
clang-16: error: unable to execute command: Segmentation fault (core dumped)
clang-16: note: diagnostic msg: Error generating preprocessed source(s).
make: *** [Makefile:58: ex_particle_SYCL_single_seq.o] Error 1
######## Finish particlefilter #########
zjin-lcf commented 1 year ago

Some of these benchmarks (below) may be updated with minor formatting and optimization.

https://github.com/zjin-lcf/HeCBench/tree/master/backprop-sycl https://github.com/zjin-lcf/HeCBench/tree/master/gaussian-sycl https://github.com/zjin-lcf/HeCBench/tree/master/particlefilter-sycl https://github.com/zjin-lcf/HeCBench/tree/master/streamcluster-sycl https://github.com/zjin-lcf/HeCBench/tree/master/lud-sycl

etiotto commented 1 year ago

Reduced test case for the problem illustrated in the description. Indexing array "A" works fine, indexing "shadow" is problematic. The difference is that "A" is an accessor for a global buffer while "shadow" is an accessor to local memory.

#include <iostream>
#include <sycl/sycl.hpp>

using namespace sycl;

#define BLOCK_SIZE 16

int test(std::array<float, BLOCK_SIZE> &A) {
  auto q = queue{};
  auto range = sycl::range<1>{BLOCK_SIZE};
  auto bufA = buffer<float, 1>{A.data(), range};

  q.submit([&](handler &cgh) {
    accessor<float, 1, access::mode::read_write, access::target::local> shadow(
        BLOCK_SIZE, cgh);
    accessor<float, 1, access::mode::read_write, access::target::global_buffer>
        A(bufA, cgh);

    cgh.parallel_for<class diagonal>(range,
                                     [=](id<1> id) { A[0] = shadow[0] = 0; });
  });
}
etiotto commented 1 year ago

PRs https://github.com/intel/llvm/pull/7663 and https://github.com/intel/llvm/pull/7684 resolve the problem illustrated at https://github.com/intel/llvm/issues/7641#issuecomment-1337894883

etiotto commented 1 year ago

Next problem affecting streamcluster:

######## Start streamcluster #########
rm -f streamcluster streamcluster.linkinfo result*
/iusers/etiotto/intel-llvm/build/bin/clang++ -w -fsycl -fsycl-targets=spir64-unknown-unknown-syclmlir  -std=c++17 -Wall   -O3 streamcluster.cpp -o streamcluster -lm 
cgeist: /nfs/site/home/etiotto/projects/intel-llvm/polygeist/tools/cgeist/Lib/ValueCategory.cc:187: void ValueCategory::store(mlir::OpBuilder &, mlir::Value) const: Assertion `toStore.getType() == val.getType().cast<MemRefType>().getElementType() && "expect same type"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /localdisk2/etiotto/intel-llvm/build/bin/cgeist -emit-llvm -w streamcluster.cpp -o /tmp/streamcluster-c49eb6.bc --args -cc1 -triple spir64-unknown-unknown-syclmlir -aux-triple x86_64-unknown-linux-gnu -fsycl-is-device -fdeclare-spirv-builtins -mllvm -sycl-opt -fenable-sycl-dae -fsycl-instrument-device-code -Wno-sycl-strict -fsycl-int-header=/tmp/streamcluster-header-e64163.h -fsycl-int-footer=/tmp/streamcluster-footer-1a9696.h -sycl-std=2020 -fsycl-unique-prefix=27d026f783b3cce6 -Wspir-compat -emit-llvm-bc -emit-llvm-uselists -disable-free -clear-ast-before-backend -main-file-name streamcluster.cpp -fsycl-use-main-file-name -full-main-file-name streamcluster.cpp -mrelocation-model static -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -fno-verbose-asm -mconstructor-aliases -aux-target-cpu x86-64 -mllvm -treat-scalable-fixed-error-as-warning -debugger-tuning=gdb -resource-dir /localdisk2/etiotto/intel-llvm/build/lib/clang/16 -internal-isystem /iusers/etiotto/intel-llvm/build/bin/../include/sycl -internal-isystem /iusers/etiotto/intel-llvm/build/bin/../include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /localdisk2/etiotto/intel-llvm/build/lib/clang/16/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /localdisk2/etiotto/intel-llvm/build/lib/clang/16/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -Wall -w -std=c++17 -fdeprecated-macro -fdebug-compilation-dir=/nfs/site/home/etiotto/projects/Rodinia_SYCL/sycl/streamcluster -ferror-limit 19 -fgnuc-version=4.2.1 -no-opaque-pointers -fcxx-exceptions -fexceptions -fcolor-diagnostics -vectorize-loops -vectorize-slp -faddrsig -D __GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/streamcluster-c49eb6.bc -x c++ streamcluster.cpp
1.      <eof> parser at end of file
 #0 0x00007f91d63fec53 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/localdisk2/etiotto/intel-llvm/build/bin/../lib/libLLVMSupport.so.16git+0x1d4c53)
 #1 0x00007f91d63fcec0 llvm::sys::RunSignalHandlers() (/localdisk2/etiotto/intel-llvm/build/bin/../lib/libLLVMSupport.so.16git+0x1d2ec0)
 #2 0x00007f91d63ff13f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f91ecfe0b20 __restore_rt sigaction.c:0:0
 #4 0x00007f91d556d37f raise (/lib64/libc.so.6+0x3737f)
 #5 0x00007f91d5557db5 abort (/lib64/libc.so.6+0x21db5)
 #6 0x00007f91d5557c89 _nl_load_domain.cold.0 loadmsgcat.c:0:0
 #7 0x00007f91d5565a76 .annobin___GI___assert_fail.end assert.c:0:0
 #8 0x00000000004d4e9c ValueCategory::store(mlir::OpBuilder&, mlir::Value) const (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x4d4e9c)
 #9 0x00000000004a85fd MLIRScanner::VisitBinAssign(clang::BinaryOperator*) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x4a85fd)
#10 0x000000000043524c clang::StmtVisitorBase<std::add_pointer, MLIRScanner, ValueCategory>::Visit(clang::Stmt*) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x43524c)
#11 0x00000000004bd9c9 MLIRScanner::VisitCompoundStmt(clang::CompoundStmt*) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x4bd9c9)
#12 0x0000000000435517 clang::StmtVisitorBase<std::add_pointer, MLIRScanner, ValueCategory>::Visit(clang::Stmt*) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x435517)
#13 0x0000000000431547 MLIRScanner::init(mlir::FunctionOpInterface, FunctionToEmit const&) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x431547)
#14 0x000000000044aa55 MLIRASTConsumer::run() (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x44aa55)
#15 0x00007f91d016adb0 clang::ParseAST(clang::Sema&, bool, bool) (/localdisk2/etiotto/intel-llvm/build/bin/../lib/../lib/libclangParse.so.16git+0x38db0)
#16 0x00007f91dc3d8f61 clang::FrontendAction::Execute() (/localdisk2/etiotto/intel-llvm/build/bin/../lib/libclangFrontend.so.16git+0x12bf61)
#17 0x0000000000455248 processInputFiles(llvm::cl::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, bool, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>> const&, llvm::cl::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, bool, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>> const&, mlir::MLIRContext&, mlir::OwningOpRef<mlir::ModuleOp>&, llvm::DataLayout&, llvm::Triple&, char const*, bool) driver.cc:0:0
#18 0x00000000004503ce main (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x4503ce)
#19 0x00007f91d5559493 __libc_start_main (/lib64/libc.so.6+0x23493)
#20 0x000000000042da8e _start (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x42da8e)
etiotto commented 1 year ago

(Fixed) Next problem affecting backprop:

######## Start backprop #########
rm -f *.o *~ backprop *.linkinfo
/iusers/etiotto/intel-llvm/build/bin/clang++ -w -fsycl -fsycl-targets=spir64-unknown-unknown-syclmlir  -std=c++17 -Wall  -O3 backprop_sycl.cpp -c
cgeist: /nfs/site/home/etiotto/projects/intel-llvm/polygeist/tools/cgeist/Lib/CGCall.cc:89: void castCallerArgs(func::FuncOp, llvm::SmallVectorImpl<Value> &, mlir::OpBuilder &): Assertion `CalleeArgType == Args[I].getType() && "Callsite argument mismatch"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /localdisk2/etiotto/intel-llvm/build/bin/cgeist -emit-llvm -w backprop_sycl.cpp -o /tmp/backprop_sycl-d104c1.bc --args -cc1 -triple spir64-unknown-unknown-syclmlir -aux-triple x86_64-unknown-linux-gnu -fsycl-is-device -fdeclare-spirv-builtins -mllvm -sycl-opt -fenable-sycl-dae -fsycl-instrument-device-code -Wno-sycl-strict -fsycl-int-header=/tmp/backprop_sycl-header-6a398c.h -fsycl-int-footer=/tmp/backprop_sycl-footer-aeaaf3.h -sycl-std=2020 -fsycl-unique-prefix=7b14c76bdff44000 -Wspir-compat -emit-llvm-bc -emit-llvm-uselists -disable-free -clear-ast-before-backend -main-file-name backprop_sycl.cpp -fsycl-use-main-file-name -full-main-file-name backprop_sycl.cpp -mrelocation-model static -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -fno-verbose-asm -mconstructor-aliases -aux-target-cpu x86-64 -mllvm -treat-scalable-fixed-error-as-warning -debugger-tuning=gdb -resource-dir /localdisk2/etiotto/intel-llvm/build/lib/clang/16 -internal-isystem /iusers/etiotto/intel-llvm/build/bin/../include/sycl -internal-isystem /iusers/etiotto/intel-llvm/build/bin/../include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /usr/lib/gcc/x86_64-redhat-
linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /localdisk2/etiotto/intel-llvm/build/lib/clang/16/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /localdisk2/etiotto/intel-llvm/build/lib/clang/16/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -Wall -w -std=c++17 -fdeprecated-macro -fdebug-compilation-dir=/nfs/site/home/etiotto/projects/Rodinia_SYCL/sycl/backprop -ferror-limit 19 -fgnuc-version=4.2.1 -no-opaque-pointers -fcxx-exceptions -fexceptions -fcolor-diagnostics -vectorize-loops -vectorize-slp -faddrsig -D __GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/backprop_sycl-d104c1.bc -x c++ backprop_sycl.cpp
1.      <eof> parser at end of file
 #0 0x00007f4e83bbec53 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/localdisk2/etiotto/intel-llvm/build/bin/../lib/libLLVMSupport.so.16git+0x1d4c53)
 #1 0x00007f4e83bbcec0 llvm::sys::RunSignalHandlers() (/localdisk2/etiotto/intel-llvm/build/bin/../lib/libLLVMSupport.so.16git+0x1d2ec0)
 #2 0x00007f4e83bbf13f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f4e9a7a0b20 __restore_rt sigaction.c:0:0
 #4 0x00007f4e82d2d37f raise (/lib64/libc.so.6+0x3737f)
 #5 0x00007f4e82d17db5 abort (/lib64/libc.so.6+0x21db5)
 #6 0x00007f4e82d17c89 _nl_load_domain.cold.0 loadmsgcat.c:0:0
 #7 0x00007f4e82d25a76 .annobin___GI___assert_fail.end assert.c:0:0
 #8 0x000000000047ce67 MLIRScanner::callHelper(mlir::func::FuncOp, clang::QualType, llvm::ArrayRef<std::pair<ValueCategory, clang::Expr*>>, clang::QualType, bool, clang::Expr*, clang::FunctionDecl const&) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x47ce67)
 #9 0x0000000000483a6a MLIRScanner::VisitCallExpr(clang::CallExpr*) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x483a6a)
#10 0x00000000004352c1 clang::StmtVisitorBase<std::add_pointer, MLIRScanner, ValueCategory>::Visit(clang::Stmt*) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x4352c1)
#11 0x0000000000439507 MLIRScanner::VisitVarDecl(clang::VarDecl*) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x439507)
#12 0x00000000004bd68e MLIRScanner::VisitDeclStmt(clang::DeclStmt*) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x4bd68e)
#13 0x0000000000435531 clang::StmtVisitorBase<std::add_pointer, MLIRScanner, ValueCategory>::Visit(clang::Stmt*) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x435531)
#14 0x00000000004bd9c9 MLIRScanner::VisitCompoundStmt(clang::CompoundStmt*) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x4bd9c9)
#15 0x0000000000435517 clang::StmtVisitorBase<std::add_pointer, MLIRScanner, ValueCategory>::Visit(clang::Stmt*) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x435517)
#16 0x0000000000431547 MLIRScanner::init(mlir::FunctionOpInterface, FunctionToEmit const&) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x431547)
#17 0x000000000044aa55 MLIRASTConsumer::run() (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x44aa55)

Possible duplicate of https://github.com/intel/llvm/issues/7662

etiotto commented 1 year ago

EDIT: fixed by https://github.com/intel/llvm/pull/7752

Next problem affecting particlefilter:

######## Start particlefilter #########
rm -f *.o *~ SYCL_particlefilter_single 
/iusers/etiotto/intel-llvm/build/bin/clang++ -w -fsycl -fsycl-targets=spir64-unknown-unknown-syclmlir  -std=c++17 -Wall -DBLOCK_SIZE=256  -O3 ex_particle_SYCL_single_seq.cpp -c
cgeist: /nfs/site/home/etiotto/projects/intel-llvm/polygeist/tools/cgeist/Lib/ValueCategory.cc:585: ValueCategory FPBinOp(mlir::OpBuilder &, mlir::Location, mlir::Value, mlir::Value) [OpTy = mlir::arith::AddFOp]: Assertion `LHS.getType() == RHS.getType() && "Cannot operate on values of different types"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /localdisk2/etiotto/intel-llvm/build/bin/cgeist -emit-llvm -w ex_particle_SYCL_single_seq.cpp -o /tmp/ex_particle_SYCL_single_seq-e9151e.bc --args -cc1 -triple spir64-unknown-unknown-syclmlir -aux-triple x86_64-unknown-linux-gnu -fsycl-is-device -fdeclare-spirv-builtins -mllvm -sycl-opt -fenable-sycl-dae -fsycl-instrument-device-code -Wno-sycl-strict -fsycl-int-header=/tmp/ex_particle_SYCL_single_seq-header-4a99c8.h -fsycl-int-footer=/tmp/ex_particle_SYCL_single_seq-footer-7c9355.h -sycl-std=2020 -fsycl-unique-prefix=78d00ad7068752e7 -Wspir-compat -emit-llvm-bc -emit-llvm-uselists -disable-free -clear-ast-before-backend -main-file-name ex_particle_SYCL_single_seq.cpp -fsycl-use-main-file-name -full-main-file-name ex_particle_SYCL_single_seq.cpp -mrelocation-model static -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -fno-verbose-asm -mconstructor-aliases -aux-target-cpu x86-64 -mllvm -treat-scalable-fixed-error-as-warning -debugger-tuning=gdb -resource-dir /localdisk2/etiotto/intel-llvm/build/lib/clang/16 -internal-isystem /iusers/etiotto/intel-llvm/build/bin/../include/sycl -internal-isystem /iusers/etiotto/intel-llvm/build/bin/../include -D BLOCK_SIZE=256 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /localdisk2/etiotto/intel-llvm/build/lib/clang/16/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /localdisk2/etiotto/intel-llvm/build/lib/clang/16/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -Wall -w -std=c++17 -fdeprecated-macro -fdebug-compilation-dir=/nfs/site/home/etiotto/projects/Rodinia_SYCL/sycl/particlefilter -ferror-limit 19 -fgnuc-version=4.2.1 -no-opaque-pointers -fcxx-exceptions -fexceptions -fcolor-diagnostics -vectorize-loops -vectorize-slp -faddrsig -D __GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/ex_particle_SYCL_single_seq-e9151e.bc -x c++ ex_particle_SYCL_single_seq.cpp
1.      <eof> parser at end of file
 #0 0x00007fac3bec1c53 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/localdisk2/etiotto/intel-llvm/build/bin/../lib/libLLVMSupport.so.16git+0x1d4c53)
 #1 0x00007fac3bebfec0 llvm::sys::RunSignalHandlers() (/localdisk2/etiotto/intel-llvm/build/bin/../lib/libLLVMSupport.so.16git+0x1d2ec0)
 #2 0x00007fac3bec213f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fac52aa3b20 __restore_rt sigaction.c:0:0
 #4 0x00007fac3b03037f raise (/lib64/libc.so.6+0x3737f)
 #5 0x00007fac3b01adb5 abort (/lib64/libc.so.6+0x21db5)
 #6 0x00007fac3b01ac89 _nl_load_domain.cold.0 loadmsgcat.c:0:0
 #7 0x00007fac3b028a76 .annobin___GI___assert_fail.end assert.c:0:0
 #8 0x00000000004dbc03 ValueCategory::FAdd(mlir::OpBuilder&, mlir::Location, mlir::Value) const (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x4dbc03)
 #9 0x00000000004a8bad MLIRScanner::EmitBinAdd(BinOpInfo const&) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x4a8bad)
#10 0x00000000004abcab MLIRScanner::EmitCompoundAssignLValue(clang::CompoundAssignOperator*, ValueCategory (MLIRScanner::*)(BinOpInfo const&)) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x4abcab)
#11 0x00000000004acfd4 MLIRScanner::VisitBinAddAssign(clang::BinaryOperator*) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x4acfd4)
#12 0x00000000004352a7 clang::StmtVisitorBase<std::add_pointer, MLIRScanner, ValueCategory>::Visit(clang::Stmt*) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x4352a7)
#13 0x00000000004bd9c9 MLIRScanner::VisitCompoundStmt(clang::CompoundStmt*) (/localdisk2/etiotto/intel-llvm/build/bin/cgeist+0x4bd9c9)
etiotto commented 1 year ago

In order to reproduce clone https://github.com/etiotto/Rodinia_SYCL/tree/sycl_mlir ( branch sycl_mlir). Edit the script sycl/run_all.sh to replace the clang compiler you want to use, then use it to run the bmks.

etiotto commented 1 year ago

Using the latest SYCL-MLIR compiler backprop now compiles and run:

 ./run_all.sh                                                                               ✔  15s  etiotto@hds-clx-7 
######## Start backprop #########
rm -f *.o *~ backprop *.linkinfo
/iusers/etiotto/intel-llvm/build/bin/clang++ -w -fsycl -fsycl-targets=spir64-unknown-unknown-syclmlir  -std=c++17 -Wall  -O3 backprop_sycl.cpp -c
/iusers/etiotto/intel-llvm/build/bin/clang++ -w -fsycl -fsycl-targets=spir64-unknown-unknown-syclmlir  -std=c++17 -Wall  -O3 backprop.c -c
/iusers/etiotto/intel-llvm/build/bin/clang++ -w -fsycl -fsycl-targets=spir64-unknown-unknown-syclmlir  -std=c++17 -Wall  -O3 imagenet.c -c
/iusers/etiotto/intel-llvm/build/bin/clang++ -w -fsycl -fsycl-targets=spir64-unknown-unknown-syclmlir  -std=c++17 -Wall  -O3 facetrain.c -c
/iusers/etiotto/intel-llvm/build/bin/clang++ -w -fsycl -fsycl-targets=spir64-unknown-unknown-syclmlir  -std=c++17 -Wall  -O3 backprop_sycl.o backprop.o imagenet.o facetrain.o -o backprop -lm 
./backprop 65536
Random number generator seed: 7
Input layer size : 65536
Starting training kernel
Performing GPU computation
Device offloading time = 0.638426(s)
...
whitneywhtsang commented 1 year ago

I will start to look at the streamcluster issue reported here: https://github.com/intel/llvm/issues/7641#issuecomment-1341172207.

etiotto commented 1 year ago

I'm looking at the other bmk that fails with an assert (particlefilter). Reduced test case:

#include <CL/sycl.hpp>

using namespace cl::sycl;
constexpr access::mode sycl_read_write = access::mode::read_write;

void likelyhood(int Nparticles) {
  cpu_selector dev_sel;
  queue q(dev_sel);
  const property_list props = property::buffer::use_host_ptr();
  float * arrayX = (float *) calloc(Nparticles, sizeof (float));
  buffer<float, 1>arrayX_GPU(arrayX, Nparticles, props);

  /****************** L I K E L I H O O D ************************************/
  q.submit([&](handler& cgh) {
    auto arrayX_acc = arrayX_GPU.get_access<sycl_read_write>(cgh);

    cgh.parallel_for<class likelihood>(
//      nd_range<1>(range<1>(global_work_size), range<1>(local_work_size)), [=] (nd_item<1> item) {
        nd_range<1>(range<1>(10), range<1>(20)), [=] (nd_item<1> item) {
        int i = item.get_global_linear_id();
        #ifdef BAD
                arrayX_acc[i] += 1.0; 
        #else
          arrayX_acc[i] = arrayX_acc[i] + 1.0; 
        #endif
      });
  });
}

Noting that if arrayX_acc[i] += 1.0; is changed to arrayX_acc[i] = arrayX_acc[i] + 1.0; the assertion disappears.

victor-eds commented 1 year ago

I will take on lud

victor-eds commented 1 year ago

I'm looking at the other bmk that fails with an assert (particlefilter). Reduced test case:

#include <CL/sycl.hpp>

using namespace cl::sycl;
constexpr access::mode sycl_read_write = access::mode::read_write;

void likelyhood(int Nparticles) {
  cpu_selector dev_sel;
  queue q(dev_sel);
  const property_list props = property::buffer::use_host_ptr();
  float * arrayX = (float *) calloc(Nparticles, sizeof (float));
  buffer<float, 1>arrayX_GPU(arrayX, Nparticles, props);

  /****************** L I K E L I H O O D ************************************/
  q.submit([&](handler& cgh) {
    auto arrayX_acc = arrayX_GPU.get_access<sycl_read_write>(cgh);

    cgh.parallel_for<class likelihood>(
//      nd_range<1>(range<1>(global_work_size), range<1>(local_work_size)), [=] (nd_item<1> item) {
        nd_range<1>(range<1>(10), range<1>(20)), [=] (nd_item<1> item) {
        int i = item.get_global_linear_id();
        #ifdef BAD
                arrayX_acc[i] += 1.0; 
        #else
          arrayX_acc[i] = arrayX_acc[i] + 1.0; 
        #endif
      });
  });
}

Noting that if arrayX_acc[i] += 1.0; is changed to arrayX_acc[i] = arrayX_acc[i] + 1.0; the assertion disappears.

https://github.com/intel/llvm/pull/7760 fixes this

victor-eds commented 1 year ago

lud fixed by #7760 also -> compiles, but fails.

etiotto commented 1 year ago

lud fixed by #7760 also -> compiles, but fails.

AFAIK lud was compiling without #7760 and was failing to verify at runtime.

etiotto commented 1 year ago

Status:

etiotto commented 1 year ago

New reduced test case for particlefilter:

#include <sycl/sycl.hpp>

using namespace sycl;
constexpr access::mode sycl_write = access::mode::write;
constexpr access::mode sycl_read = access::mode::read;
#ifndef TY
#define TY unsigned char
#endif

void likelyhood(int Nparticles, int IszX, int IszY, int Nfr, int countOnes) {
  queue q;
  buffer<float, 1> A(Nparticles + 1);
  buffer<TY, 1> B(Nparticles + 1);

  q.submit([&](handler &cgh) {
    auto A_acc = A.get_access<sycl_write>(cgh);
    auto B_acc = B.get_access<sycl_read>(cgh);

    cgh.parallel_for<class likelihood>(range<1>(10), [=](item<1> Item) {
      id<1> Id = Item.get_id();
      A_acc[Id] = B_acc[Id];
    });
  });
}

Fails with the following error (noting also that when TY is not char or unsigned char but, for example, is a short the error disappears):

clang++  -fsycl -fsycl-targets=spir64-unknown-unknown-syclmlir  -std=c++17 -Wall -DBLOCK_SIZE=256  -O3 ~/projects/tmp/ex_particle_reduced.cpp -w -c 2>&1 | grep error
error: 'sycl.accessor.subscript' op Expecting memref return type. Got '<<NULL TYPE>>'
victor-eds commented 1 year ago

Fails with the following error (noting also that when TY is not char or unsigned char but, for example, is a short the error disappears):

Apparently, compiling pointers to i8 types yields an llvm pointer instead of a memref. We might be having issues with that. IMO, we should generate memref<i8> instead. Related issue: https://github.com/intel/llvm/pull/7767

etiotto commented 1 year ago

With fix in PR https://github.com/intel/llvm/pull/7784 particlefilter compiles to the end, and the bmk runs.

etiotto commented 1 year ago

Using draft PR https://github.com/intel/llvm/pull/7783 streamcluster no longer asserts in the store operation, the assertion moved to SYCLAccessorSubscriptOp::verify() (because that operation receives a !llvm.ptr<?xstruct....> and it expects a memref). Given that cgeist always represents pointers to struct by using !llvm.ptr, we might want to relax that verification code. @whitneywhtsang will work on that.

I have quickly verified that this is the last compile time issue affecting streamcluster. Commenting out the verification code for SYCLAccessorSubscriptOp yields successful compilation and the bmk runs clean.

whitneywhtsang commented 1 year ago

Created https://github.com/intel/llvm/pull/7802, for the verification issue mentioned in https://github.com/intel/llvm/issues/7641#issuecomment-1352344685.

whitneywhtsang commented 1 year ago

Status:

whitneywhtsang commented 1 year ago

Performance measurements on Intel(R) Iris(R) Xe Graphics: t t

etiotto commented 1 year ago

Great to see that at -O3 we are mostly ahead of the default clang (without MLIR).