Open ndcontini opened 1 year ago
The ports seem not to be created correctly in the compilation stage because the compile settings file is ignored (Specified with XILINX_COMPILE_SETTINGS_FILE
in the config file). This is a bug in the benchmark CMake scripts.
As a workaround, adding the following line to your config file should fix this:
set(XILINX_ADDITIONAL_COMPILE_FLAGS "--hls.max_memory_ports=all" CACHE STRING "Additional compile flags for v++" FORCE)
Thank you for the response. That workaround got me past the initial link tasks. FYI, this is also an issue when building PTRANS. I assume the same workaround will work.
After building with the workaround I run into a different error:
[19:29:13] Starting logic placement..
[19:31:26] Phase 1 Placer Initialization
[19:31:26] Phase 1.1 Placer Initialization Netlist Sorting
[19:35:17] Phase 1.2 IO Placement/ Clock Placement/ Build Placer Device
[19:37:30] Phase 1.3 Build Placer Netlist Model
[19:43:36] Phase 1.4 Constrain Clocks/Macros
[19:44:08] Phase 2 Global Placement
[19:44:08] Phase 2.1 Floorplanning
[19:45:49] Phase 2.1.1 Partition Driven Placement
[19:45:49] Phase 2.1.1.1 PBP: Partition Driven Placement
[19:53:01] Phase 2.1.1.2 PBP: Clock Region Placement
[19:54:42] Phase 2.1.1.3 PBP: Discrete Incremental
[19:55:15] Phase 2.1.1.4 PBP: Compute Congestion
[19:55:15] Phase 2.1.1.5 PBP: Macro Placement
[19:56:22] Phase 2.1.1.6 PBP: UpdateTiming
[19:57:29] Phase 2.2 Update Timing before SLR Path Opt
[19:57:29] Phase 2.3 Global Placement Core
[20:17:01] Phase 2.3.1 Physical Synthesis In Placer
[20:39:19] Phase 3 Detail Placement
[20:39:19] Phase 3.1 Commit Multi Column Macros
[20:39:52] Phase 3.2 Commit Most Macros & LUTRAMs
[20:42:40] Phase 3.3 Small Shape DP
[20:42:40] Phase 3.3.1 Small Shape Clustering
[20:44:20] Phase 3.3.2 Flow Legalize Slice Clusters
[20:44:54] Phase 3.3.3 Slice Area Swap
[20:47:41] Phase 3.4 Place Remaining
[20:48:15] Phase 3.5 Re-assign LUT pins
[20:48:47] Phase 3.6 Pipeline Register Optimization
[20:49:22] Phase 3.7 Fast Optimization
[20:52:10] Phase 4 Post Placement Optimization and Clean-Up
[20:52:10] Phase 4.1 Post Commit Optimization
[20:58:19] Phase 4.1.1 Post Placement Optimization
[20:58:19] Phase 4.1.1.1 BUFG Insertion
[20:58:19] Phase 1 Physical Synthesis Initialization
[21:01:40] Phase 4.1.1.2 BUFG Replication
[21:05:00] Phase 4.1.1.3 Replication
[21:08:54] Phase 4.2 Post Placement Cleanup
[21:09:28] Phase 4.3 Placer Reporting
[21:09:28] Phase 4.3.1 Print Estimated Congestion
[21:10:01] Phase 4.4 Final Placement Cleanup
[21:24:07] Run vpl: Step impl: Failed
[21:24:11] Run vpl: FINISHED. Run Status: impl ERROR
===>The following messages were generated while processing /upb/departments/pc2/groups/hpc-prf-mpifpga/LINPACK/src/device/_x/link/vivado/vpl/prj/prj.runs/impl_1 :
ERROR: [VPL 17-69] Command failed: Failed to create design checkpoint
ERROR: [VPL 60-773] In '/upb/departments/pc2/groups/hpc-prf-mpifpga/LINPACK/src/device/_x/link/vivado/vpl/runme.log', caught Tcl error: problem implementing dynamic region, impl_1: place_design ERROR, please look at the run log file '/upb/departments/pc2/groups/hpc-prf-mpifpga/LINPACK/src/device/_x/link/vivado/vpl/prj/prj.runs/impl_1/runme.log' for more information
WARNING: [VPL 60-732] Link warning: No monitor points found for BD automation.
ERROR: [VPL 60-704] Integration error, problem implementing dynamic region, impl_1: place_design ERROR, please look at the run log file '/upb/departments/pc2/groups/hpc-prf-mpifpga/LINPACK/src/device/_x/link/vivado/vpl/prj/prj.runs/impl_1/runme.log' for more information
ERROR: [VPL 60-1328] Vpl run 'vpl' failed
ERROR: [VPL 60-806] Failed to finish platform linker
INFO: [v++ 60-1442] [21:24:19] Run run_link: Step vpl: Failed
Time (s): cpu = 00:06:41 ; elapsed = 04:17:11 . Memory (MB): peak = 1756.656 ; gain = 0.000 ; free physical = 156008 ; free virtual = 461854
ERROR: [v++ 60-661] v++ link run 'run_link' failed
ERROR: [v++ 60-626] Kernel link failed to complete
ERROR: [v++ 60-703] Failed to finish linking
INFO: [v++ 60-1653] Closing dispatch client.
make[3]: *** [src/device/CMakeFiles/hpl_torus_PCIE_xilinx.dir/build.make:75: bin/hpl_torus_PCIE.xclbin] Error 1
make[2]: *** [CMakeFiles/Makefile2:501: src/device/CMakeFiles/hpl_torus_PCIE_xilinx.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:508: src/device/CMakeFiles/hpl_torus_PCIE_xilinx.dir/rule] Error 2
make: *** [Makefile:283: hpl_torus_PCIE_xilinx] Error 2
I tried to do some basic web searching to figure out why this issue arises, but can't seem to figure out if this is a local issue or an issue with the build process.
From the current output, I can only see, that the synthesis failed during the placement phase. Maybe you can find out more when looking into the log files mentioned in the error message. Does the design overutilize the resources on one SLR? The matrix multiplication kernels are quite large in this configuration and nearly fill a whole SLR. Together with the DDR memory interconnect it may get dense on SLR1. But it should not overutilize the available resources on the U280.
If it is just about getting some working bitstream, we may change to HBM instead and/or reduce the size of the design. But maybe its better to first track down this issue.
I think this isn't an issue with overutilization. It honestly seems like it could be a bug inside Vitis itself. This is the first error from the exceprt above as well as in the runme.log the console output refers to:
ERROR: [VPL 17-69] Command failed: Failed to create design checkpoint
I retried building the kernels on a newer version of Vitis, did not get the above errors, so I'm going to assume this is an issue with Vitis 20.2. However with HBM enabled and disabled, I get timing errors:
[07:55:26] Starting bitstream generation..
Starting optional post-route physical design optimization.
[08:36:26] Phase 1 Physical Synthesis Initialization
[08:43:23] Phase 2 Critical Path Optimization
Finished optional post-route physical design optimization.
[12:31:46] Run vpl: Step impl: Failed
[12:31:53] Run vpl: FINISHED. Run Status: impl ERROR
===>The following messages were generated while Compiling (bitstream) accelerator binary: hpl_torus_PCIE Log file: /upb/departments/pc2/groups/hpc-prf-mpifpga/LINPACK/src/device/_x/link/vivado/vpl/prj/prj.runs/impl_1/runme.log :
ERROR: [VPL 101-2] design did not meet timing - Design did not meet timing. One or more unscalable system clocks did not meet their required target frequency. For all system clocks, this design is using 0 nanoseconds as the threshold worst negative slack (WNS) value. List of system clocks with timing failure:
system clock: pll_clk[0]_DIV; slack: -1.009 ns
system clock: mmcm_clkout0; slack: -0.751 ns
system clock: mmcm_clkout0_1; slack: -0.166 ns
system clock: pll_clk[1]_DIV; slack: -0.134 ns
ERROR: [VPL 101-3] sourcing script /upb/departments/pc2/groups/hpc-prf-mpifpga/LINPACK/src/device/_x/link/vivado/vpl/scripts/impl_1/_full_write_bitstream_pre.tcl failed
ERROR: [VPL 60-773] In '/upb/departments/pc2/groups/hpc-prf-mpifpga/LINPACK/src/device/_x/link/vivado/vpl/runme.log', caught Tcl error: problem implementing dynamic region, impl_1: write_bitstream ERROR, please look at the run log file '/upb/departments/pc2/groups/hpc-prf-mpifpga/LINPACK/src/device/_x/link/vivado/vpl/prj/prj.runs/impl_1/runme.log' for more information
WARNING: [VPL 60-732] Link warning: No monitor points found for BD automation.
ERROR: [VPL 60-704] Integration error, problem implementing dynamic region, impl_1: write_bitstream ERROR, please look at the run log file '/upb/departments/pc2/groups/hpc-prf-mpifpga/LINPACK/src/device/_x/link/vivado/vpl/prj/prj.runs/impl_1/runme.log' for more information
ERROR: [VPL 60-1328] Vpl run 'vpl' failed
ERROR: [VPL 60-806] Failed to finish platform linker
INFO: [v++ 60-1442] [12:32:06] Run run_link: Step vpl: Failed
Time (s): cpu = 00:58:31 ; elapsed = 14:52:56 . Memory (MB): peak = 2265.207 ; gain = 0.000 ; free physical = 139218 ; free virtual = 467439
ERROR: [v++ 60-661] v++ link run 'run_link' failed
ERROR: [v++ 60-626] Kernel link failed to complete
ERROR: [v++ 60-703] Failed to finish linking
INFO: [v++ 60-1653] Closing dispatch client.
make[3]: *** [src/device/CMakeFiles/hpl_torus_PCIE_xilinx.dir/build.make:75: bin/hpl_torus_PCIE.xclbin] Error 1
make[2]: *** [CMakeFiles/Makefile2:501: src/device/CMakeFiles/hpl_torus_PCIE_xilinx.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:508: src/device/CMakeFiles/hpl_torus_PCIE_xilinx.dir/rule] Error 2
Please let me know what extra information I can give to help diagnose the issue.
Interesting. What Vitis and XRT version do you use now?
Could you please provide the v++ log for the compilation (bin/xilinx_reports/logs/v++_hpl_torus_PCIE.log
) and linking (bin/xilinx_reports/logs/link/v++.log
)?
[mpifpga2@n2login2 LINPACK]$ vitis --version
****** Xilinx Vitis Development Environment
****** Vitis v2021.2 (64-bit)
**** SW Build 3363750 on 2021-10-16-13:10:08
** Copyright 1986-2021 Xilinx, Inc. All Rights Reserved.
[mpifpga2@n2login2 LINPACK]$ xbutil --version
Version : 2.12.429
Branch : 2021.2_RHEL8.5
Hash : 2180e838abe791cb1e90d9011bbc8b3676774172
Hash Date : 2022-04-08 11:43:35
XOCL : unknown, unknown
XCLMGMT : unknown, unknown
It looks like the two logs are the same. Could you please re-upload the compilation logs? It should be this path: bin/xilinx_reports/logs/v++_hpl_torus_PCIE.log
I think bin/xilinx_reports/logs/hpl_torus_PCIE/v++.log is the compilation log actually.
I am attempting to build the torus kernel for the LINPACK benchmark, but the build errors out in the link stage due to an invalid port mapping. I'm not sure I understand why this issue is occuring, but my guess would be that the m_axi_gmemX ports are expected to be specified within the kernel. This error seems to imply the final kernel code is not being generated correctly. Is there a setting in my build that is missing? I expected the config file to take care of most of the gotchas, since U280s seem to be supported by the benchmark.