pc2 / HPCC_FPGA

A OpenCL-based FPGA benchmark suite for HPC
MIT License
32 stars 11 forks source link

STREAM benchmark: failed to load xclbin: Invalid argument #13

Closed ndcontini closed 1 year ago

ndcontini commented 1 year ago

I am trying to get one of these benchmarks working on the Noctua2 system. When I build the emulated kernel and attempt to run, I get the following output:

-------------------------------------------------------------
General setup:
C++ high resolution clock is used.
The clock precision seems to be 1.00000e+01ns
-------------------------------------------------------------
Selected Platform: Xilinx
Multiple devices have been found. Select the device by typing a number:
0) xilinx_u280_xdma_201920_3
1) xilinx_u280_xdma_201920_3
2) xilinx_u280_xdma_201920_3
Enter device id [0-2]:0
-------------------------------------------------------------
Selection summary:
Platform Name: Xilinx
Device Name:   xilinx_u280_xdma_201920_3
-------------------------------------------------------------
-------------------------------------------------------------
FPGA Setup:./bin/stream_kernels_single_emulate.xclbin
XRT build version: 2.12.429
Build hash: 2180e838abe791cb1e90d9011bbc8b3676774172
Build date: 2022-04-08 11:43:35
Git branch: 2021.2_RHEL8.5
PID: 1159967
UID: 92395
[Mon Dec 19 17:29:04 2022 GMT]
HOST: n2fpga14
EXE: /upb/departments/pc2/users/m/mpifpga2/repos/HPCC_FPGA/STREAM/build/bin/STREAM_FPGA_xilinx
[XRT] ERROR: See dmesg log for details. err=-2
[XRT] ERROR: failed to load xclbin: Invalid argument
ERROR in OpenCL library detected! Aborting.
/upb/departments/pc2/users/m/mpifpga2/repos/HPCC_FPGA/shared/setup/fpga_setup.cpp:168: CL_OUT_OF_HOST_MEMORY
An error occured while setting up the benchmark: 
    An OpenCL error occured: CL_OUT_OF_HOST_MEMORY
Benchmark execution started without successfully running the benchmark setup!

These are the steps I'm taking to build and run the benchmark:

cd STREAM
mkdir build
cd build
cmake .. -DVitis_INCLUDE_DIRS=/opt/software/FPGA/Xilinx/Vitis/2021.2/include -DVitis_FLOATING_POINT_LIBRARY=/opt/software/FPGA/Xilinx/Vitis_HLS/2022.1/lnx64/tools/fpo_v7_0/libIp_floating_point_v7_0_bitacc_cmodel.so  -DHPCC_FPGA_CONFIG=$PWD/../configs/Xilinx_U280_DP.cmake -DMPI_C=$HOME/repos/mvapich2/install/lib/libmpi.so -DMPI_CXX=$HOME/repos/mvapich2/install/lib/libmpi.so
make all
CL_CONTEXT_EMULATOR_DEVICE=1 srun -p fpga -N 1 --constraint=xilinx_u280_xrt2.12 -t 00:30:00 ./bin/STREAM_FPGA_test_xilinx -f ./bin/stream_kernels_single_emulate.xclbin

Software versions: XRT: v2.12 Vitis: v21.2 Device Platform: u280_xdma_201920_3_3246211 HPCC_FPGA: v0.5.1

It seems as if the xclbin is not being generated correctly. Am I missing a build step or is this a bug in the build?

Mellich commented 1 year ago

It seems you execute the benchmark with an emulation bitstream but the host code selects a hardware device for execution.

To run the benchmark in emulation with the Xilinx tools, you need to replace the CL_CONTEXT_EMULATOR_DEVICE=1 environment variable (which is Intel-only) with XCL_EMULATION_MODE=sw_emu. Note, that you may need to manually select the emulation platform and device if the benchmark still tries to use the hardware device.

To run the benchmark in hardware instead, you additionally need to synthesize the hardware design: make stream_kernels_single_xilinx. Building the hardware design may take several hours.

ndcontini commented 1 year ago

Thank you for pointing out XCL_EMULATION_MODE to use. The emulation now runs, however it seems to fail some tests. Either way, we at least got the test running and can mark this as resolved.