Closed gabrielrodcanal closed 11 months ago
Hi @gabrielrodcanal,
Which shell version are you using: xilinx_vck5000_gen3x16_xdma_1_202120_1
or xilinx_vck5000_gen4x8_xdma_2_202210_1
? For the former, you'll need to check out branch 2021.2
of this repo (and it only works with Vivado 2021.2
). I do not have support for the latter unfortunately..
It's also worth mentioning that there's a mlir-air project which can support running mlir-aie
on VCK5000 as well. But as far as I know, they build their own custom shell.
Hi @nqdtan,
It's actually xilinx_vck5000_gen4x8_xdma_2_202210
. Any chance this is supported?
Would you recommend going for mlir-air with their custom shell or using yours instead? Correct me if I'm wrong but my understanding is the mlir-air stuff will only allow to program the AIEs with the AIR dialect, whereas your shell will enable the AIE dialect, which seems to have more support.
xilinx_vck5000_gen4x8_xdma_2_202210
is doable. You will need to run a sample Vitis flow for that shell and then extract relevant TCL files for the Vivado hardware design. I can provide more detail/help if you want to follow this path. The goal is creating the block design ulp_bd_with_aie_16 which interfaces with the given base platform. It might be possible to source this script as a TCL hook in Vitis now that I think about it so that you don't need to create a Vivado project.
I think it depends on whichever option is easier to adopt in your situation. If the end goal is running mlir-aie
/ AIE dialect on VCK5000, both should work. If you look around the test files under mlir-air/test
, you will see that there are some that use AIE dialect as well. This project doesn't have more support for the AIE Dialect than mlir-air
.
One of the advantages of using the official shell from the AMD Xilinx member site is that the fan noise has been moderated quite a bit in the recent base platform release (qdma 2022.2) -- which is a concern to me as the machine that I have the card installed is on a shared office floor with others.
I think your approach will work for me best, as I cannot really change the platform, since this VCK5000 is shared with other users that use it just for kernel acceleration and they've already been working on the xdma shell. I am assuming that your approach doesn't really alter the blp whatsoever and all the necessary logic for interfacing with the AIEs through PCIe is included in the ulp, so that any bitstream previously generated for the xilinx_vck5000_gen4x8_xdma_2_202210
could still be loaded in the VCK5000, as we haven't changed the shell. Am I right?
I guess by running the Vitis flow you're referring to generating a hardware platform out of the xdma shell. If you could give me some basic pointers on how to get to create that block design that'd be very helpful. I only got as far as generating a platform project with the xdma shell .xsa. The platform.tcl file generated is almost empty though.
Yes, running a bitstream generated by this project won't alter anything in the shell (the base platform -- blp), so you could still load any previous bitstream files.
By Vitis flow, I was referring to generating a bitstream that contains the block design of this repo (i.e., ulp_bd_with_aie.tcl
) in Vitis. It is not straightfoward to do so, since it is unclear to me how to specify the NOC connectivity in Vitis, especially those that concern with the AI engine IP block. That's why I created this project in the first place -- to let someone modify the Block Design in Vivado GUI and then just do bitstream generation as in normal Vivado flow. The other problem is that, when going from one version to another version (e.g., 2021.1 --> 2022.2), some IPs, port interfaces, or even the top-level module got renamed, so we'll have to update the scripts accordingly -- as you have observed when trying to run the 2022.2
script with the 2022.1 xdma shell. I currently don't have access to Vivado/Vitis 2022.1 to recreate this flow in the older version. But I can show you an alternative approach to get this working. I have tested this with 2022.2; you probably will need to adjust certain things for 2022.1
======
kernel_pack_data_mover_mm2mm
as shown in the README.v++ -l --platform xilinx_vck5000_gen4x8_qdma_2_202220_1 --temp_dir tmp
Abort the run at step
Run vpl: Step synth: Started
Open the Vivado project created by Vitis
cd tmp/link/vivado/vpl/prj
vivado prj.xpr
Open the block design named ulp_inst_0
(name might be different)
Source the attached script. The script may need to be updated accordingly in 2022.1 and based on your system; what I would do is stepping through one command at a time to easily track what's going on when there is an error.
source ulp_bd_aie.tcl
launch_runs impl_1 -to_step write_device_image -jobs 16
*.pdi
file under prj.runs/impl_1/
. You can follow the steps in the README for how to obtain a xclbin
file from *.pdi
.======
Regarding the Block design with AIE, I will explain a bit what goes in there (derived from the original VCK190 block design from the mlir-aie
project)
This is the screenshot of the original bare ULP
This is the screenshot of the ulp_bd_aie
The changes are highlighted as follows
Briefly, this design uses data_mover_mm2mm
HLS IP to send or receive (configuration) data from the host to/from the AIE (through the axi_noc_aie_prog
to the AI Engine IP) and the DDR (through the axi_noc_kernel0
to BLP_M_M00_INI_0
). There is also a small BRAM (emb_mem_gen_0
) accessed by both the HLS IP and the AI Engine -- basically enabling the AI Engine to read/write PL BRAMs (some tests in mlir-aie
require that functionality).
Thank you so much for your detailed instructions @nqdtan, this is very appreciated.
I managed to generate the .xclbin
, but I get error -22 from XRT and when checking dmesg
I found out that there is a mismatch between the UUID of the shell and that of the bitstream. In fact, the bitstreams's UUID is 00000000-0000-0000-0000-000000000000 (xclbinutil --input ulp.xclbin --info). Did something go wrong in the process?
Edit: I stand corrected, it is shell that is reported to have that UUID. Here the contents of ulp.xclbin.info
.
==============================================================================
XRT Build Version: 2.13.478 (2022.1)
Build Date: 2022-05-16 22:30:16
Hash ID: 458699e9617da693e354d95b637df38daa2ed40a
==============================================================================
The BUILD_METADATA section is not present. Reports will be limited.
==============================================================================
xclbin Information
------------------
Generated by: <unknown>
Version: 2.13.478
Kernels: <unknown>
Signature:
Content:
UUID (xclbin): e253e5f9-490d-6186-1a07-82f8aff27691
UUID (IINTF): eaae3fb8b262b65b21fec0676792ebfc
Sections: BITSTREAM_PARTIAL_PDI, IP_LAYOUT, MEM_TOPOLOGY,
PARTITION_METADATA, CLOCK_FREQ_TOPOLOGY,
EMBEDDED_METADATA, GROUP_TOPOLOGY
==============================================================================
Hardware Platform (Shell) Information
-------------------------------------
Platform VBNV: xilinx_vck5000_gen4x8_xdma_2_202210_1
Static UUID: 00000000-0000-0000-0000-000000000000
Feature ROM TimeStamp: 0
Scalable Clocks
------
Name: KERNEL_CLK
Index: 0
Type: KERNEL
Frequency: 100 MHz
Name: DATA_CLK
Index: 1
Type: DATA
Frequency: 300 MHz
System Clocks
------
No system clock data available.
Memory Configuration
--------------------
Name: MC_NOC0
Index: 0
Type: MEM_DDR4
Base Address: 0xc100000000
Address Size: 0x300000000
Bank Used: Yes
==============================================================================
User Added Key Value Pairs
--------------------------
<empty>
==============================================================================
I also get all 0s for the hardware platform Static UUID
, and my xclbin
files work fine, so it seems that it is not an issue.
Can you send me the full dmesg log? It might also help to cold reboot the machine and try loading the bitstream again.
I notice that the interface UUID (IINTF
) is actually different in each shell version:
2021.2: e221a0ff8695d5eb8725fc65147f90c3
2022.2: eaae3fb8b262b65b21fec0676792ebfc
So you need to use the correct INTF UUID in 2022.1. If you have a different xclbin
file generated by normal Vitis flow, you can extract that information from it. For example, this command
xclbinutil --dump-section PARTITION_METADATA:JSON:partition_metadata_2022_1.json --input normal_vitis_flow_2022_1.xclbin
will generate partition_metadata_2022_1.json
with the correct interface UUID. Another way is to look at the bitstream content in *xclbin.info
file. Then you can use it for your ULP AIE bitstream generation (in place of xclbin_generator/partition_metadata.json
of this repo).
I also get all 0s for the hardware platform
Static UUID
, and myxclbin
files work fine, so it seems that it is not an issue.Can you send me the full dmesg log? It might also help to cold reboot the machine and try loading the bitstream again. dmesg.log
Here's the dmesg
log.
I notice that the interface UUID (
IINTF
) is actually different in each shell version:2021.2:
e221a0ff8695d5eb8725fc65147f90c3
2022.2:eaae3fb8b262b65b21fec0676792ebfc
So you need to use the correct INTF UUID in 2022.1. If you have a different
xclbin
file generated by normal Vitis flow, you can extract that information from it. For example, this commandxclbinutil --dump-section PARTITION_METADATA:JSON:partition_metadata_2022_1.json --input normal_vitis_flow_2022_1.xclbin
will generate
partition_metadata_2022_1.json
with the correct interface UUID. Another way is to look at the bitstream content in*xclbin.info
file. Then you can use it for your ULP AIE bitstream generation (in place ofxclbin_generator/partition_metadata.json
of this repo).
I'm going to build a bitstream with this platform and I will let you know if what you say here works. Thanks!
I notice that the interface UUID (
IINTF
) is actually different in each shell version:2021.2:
e221a0ff8695d5eb8725fc65147f90c3
2022.2:eaae3fb8b262b65b21fec0676792ebfc
So you need to use the correct INTF UUID in 2022.1. If you have a different
xclbin
file generated by normal Vitis flow, you can extract that information from it. For example, this commandxclbinutil --dump-section PARTITION_METADATA:JSON:partition_metadata_2022_1.json --input normal_vitis_flow_2022_1.xclbin
will generate
partition_metadata_2022_1.json
with the correct interface UUID. Another way is to look at the bitstream content in*xclbin.info
file. Then you can use it for your ULP AIE bitstream generation (in place ofxclbin_generator/partition_metadata.json
of this repo).
This effectively solved the issue, I'm able to load the bitstream now, thank you. However, the program doesn't run correctly. I get the following output (see attached file). Only the first 100 lines are shown, since it gets stuck in an infinite loop printing similar lines.
Can you try recompiling libxaie
with the following commands
cd embeddedsw/XilinxProcessorIPLib/drivers/aienginev2/src
make clean
make -f Makefile.Linux CFLAGS+=-DPL
# Or run the compile.sh script
This would enable the actual call to PL_Write32
to send configuration data to AIE instead of printing it out. Refer to embeddedsw/XilinxProcessorIPLib/drivers/aienginev2/src/io_backend/ext/xaie_debug.c
as shown in the following screenshot:
I will update the instruction in the README on this. Let me know how it goes.
All those messages disappear after recompiling libxaie
in PL mode. However, the host code gets stuck on Configuring AIE...
Ok.. Perhaps the Vivado design is the problem. Could you check to make sure that your block design address editor looks similar to this:
In particular, make sure that aiengine
is registered as a slave device of data_mover_mm2mm
at the address 0x200_0000_0000, so that data_mover_mm2mm
can initiate the write request to aiengine
at that address as we specify in the host code.
It might also help to track where things actually stuck. I would add some debug printouts in PL_Write32
in src/host_sw_with_aie/common.c
to see whether it stucks at the first PL_Write32
call, or after several calls.
The address editor looks like that screenshot you have shared. I have found that the host code gets stuck in this line https://github.com/nqdtan/vck5000_vivado_ulp/blob/a4d4ac6be53d49b5e5dadaff2878eabd169ba6ad/host_sw_with_aie/host.cpp#L172
There are PL_Write32
calls issued up to https://github.com/nqdtan/vck5000_vivado_ulp/blob/a4d4ac6be53d49b5e5dadaff2878eabd169ba6ad/host_sw_with_aie/host.cpp#L168
If you comment out line 172, will it give correct result?
No.
Looks like there is some issue with the AIE accessing the PL BRAM. Here is the address editor showing their connections
Can you run this command in Vivado
write_bd_design ulp_aie.tcl
and share with me ulp_aie.tcl
?
We also need to make sure that the AIE did get some meaningful configuration data (not all 0s). Can you replace line 180 of host.cpp
with this
data_mover_mm_read(dev_mm1, host_mm2, XAIE_BASE_ADDR + 0x0000020000 + _XAie_GetTileAddr(&DevInst, 1, 18), len, print_result);
Here, I added the offset 0x0000020000
to print out the program memory of tile AIE(18, 1) instead of data memory. This helps to debug whether the AIE receives the config data correctly. You can also print out the DMA, Stream switch configs as well.
Looks like there is some issue with the AIE accessing the PL BRAM. Here is the address editor showing their connections
Can you run this command in Vivado
write_bd_design ulp_aie.tcl
and share with me
ulp_aie.tcl
?
I generated the tcl file for the block design with write_bd_tcl ulp_aie.tcl
.
We also need to make sure that the AIE did get some meaningful configuration data (not all 0s). Can you replace line 180 of
host.cpp
with thisdata_mover_mm_read(dev_mm1, host_mm2, XAIE_BASE_ADDR + 0x0000020000 + _XAie_GetTileAddr(&DevInst, 1, 18), len, print_result);
Here, I added the offset
0x0000020000
to print out the program memory of tile AIE(18, 1) instead of data memory. This helps to debug whether the AIE receives the config data correctly. You can also print out the DMA, Stream switch configs as well.
Okay, that line printed all 0s, actually.
Thanks for checking. I just wanted to check with you another thing: if you enable line 14 and line 30 in common.c
to print out the config info, are you able to see something similar to this log file attached (instead of all 0s)? These are all the configs up to line 173.
XAIE_BASE_ADDR + 0x0000020000 + _XAie_GetTileAddr(&DevInst, 1, 18)
equals to 0x20009060000
. It should return 0x4e65020b
when you read that memory location.
Thanks for checking. I just wanted to check with you another thing: if you enable line 14 and line 30 in
common.c
to print out the config info, are you able to see something similar to this log file attached (instead of all 0s)? These are all the configs up to line 173.
XAIE_BASE_ADDR + 0x0000020000 + _XAie_GetTileAddr(&DevInst, 1, 18)
equals to0x20009060000
. It should return0x4e65020b
when you read that memory location.
How weird. When I replace line 180 by what you sent me here I get what I show on the screenshot.
We also need to make sure that the AIE did get some meaningful configuration data (not all 0s). Can you replace line 180 of
host.cpp
with thisdata_mover_mm_read(dev_mm1, host_mm2, XAIE_BASE_ADDR + 0x0000020000 + _XAie_GetTileAddr(&DevInst, 1, 18), len, print_result);
Here, I added the offset
0x0000020000
to print out the program memory of tile AIE(18, 1) instead of data memory. This helps to debug whether the AIE receives the config data correctly. You can also print out the DMA, Stream switch configs as well. https://docs.xilinx.com/r/en-US/am015-versal-aie-register-reference/Program_Memory-aie_core_module-RegisterOkay, that line printed all 0s, actually.
When I enable lines 14 and 30 in common.cpp
I see something like this, very similar to your log (notice I edited this comment, I thought they were different at first):
Okay. That's normal. This is just to make sure we are not sending all 0s to the AIE. The config data are derived from the XAie
calls and the elf
file for the core (18, 1). But that does not mean the AIE receives them correctly.
I'm curious about how you built the Vivado design. In the BD script that you have shared with me, I noticed this line
set_property PFM_NAME {xilinx:vck5000:gen4x8_qdma_2:202220.1} [get_files [current_bd_design].bd]
I thought you are using 2022.1 xdma shell, so this appears strange to me. You won't be able to use the script ulp_bd_with_aie.tcl
here to build a working bitstream for shell 2022.1, since the ULP interface might differ from 2022.2.
I followed the steps that you gave me:
Generate the IP kernel_pack_data_mover_mm2mm as shown in the README.
Launch a "bare" Vitis project without any design
v++ -l --platform xilinx_vck5000_gen4x8_qdma_2_202220_1 --temp_dir tmp
Abort the run at step
Run vpl: Step synth: Started
Open the Vivado project created by Vitis
cd tmp/link/vivado/vpl/prj
vivado prj.xpr
Open the block design named ulp_inst_0 (name might be different)
Source the attached script. The script may need to be updated accordingly in 2022.1 and based on your system; what I would do is stepping through one command at a time to easily track what's going on when there is an error.
source ulp_bd_aie.tcl
Once I got here I got the error:
"Design ulp_int_0 already exists in your project, please set the variable <design_name> to another value."
To fix it I changed the name of the design here https://github.com/nqdtan/vck5000_vivado_ulp/blob/a4d4ac6be53d49b5e5dadaff2878eabd169ba6ad/ulp_bd_with_aie_16.tcl#L57
to ulp_inst_with_aie_0
. After that I could follow the rest of the steps you gave me normally. I found it strange that Vivado was happy with that property, however.
I think you might have sourced the wrong script. Make sure that you really source the (new) script attached in that post (ulp_bd_aie.tcl
), not of this repo (ulp_bd_with_aie_16.tcl
). In that attached (new) script, I don't use (*), so it should not cause the error as you had seen there.ulp_inst_0
In addition to that, you also cannot rename the BD, because the later step (link_design
-- just before the Implementation) looks for the exact ulp_inst_0
module synthesized in the Synthesis step to combine with the precompiled shell design checkpoint (DCP). Basically, ulp_inst_0
is a reconfigurable blackbox module in the shell DCP, and it is to be synthesized separately and later recombined. I wasn't sure why Vivado was able to proceed normally in your case; I think what you did is essentially just create a different BD, and Vivado ignored it and used the original BD.
If you implement everything correctly, the Implementation netlist should look similar to this. Notice how some PL netlist gravitates towards the AIE
And the NOC Placement and Routing should look like this
(*) Edit: meant to say I don't explicitly refer to "ulp_inst_0" in the script.
Also, I would like to add one more thing:
In Step4, once you open the Vivado project, make sure that you cancel the current synthesis run to effectively stop Vivado from synthesizing the original BD (hit the cancel button on the top right corner).
Hi, I managed to reproduce all the steps with the script you attached in your previous comment. I think the NoC is not properly connected. It looks very different from your picture.
The implemented design however looks similar to yours:
Any idea what might be going on?
Can you send me your top_wrapper_routed.dcp
+ the BD script (generated by write_bd_tcl
)? I'd to like inspect the DCP if it actually contains the data_mover_mm2mm
and BRAM cells.
Can you send me your
top_wrapper_routed.dcp
+ the BD script (generated bywrite_bd_tcl
)? I'd to like inspect the DCP if it actually contains thedata_mover_mm2mm
and BRAM cells.
Yes, find the files in the following link: https://uoe-my.sharepoint.com/:u:/g/personal/s2081362_ed_ac_uk/Edo4feDV3oVEjiRBzIsDnxYBPNetg9ds8aJEc7Ly8TrqHw?e=rgAMxZ
There is no data_mover_mm2mm
cell in your final implementation.
The NoC solution window indicates that there are no AIE NMUs being used.
I guess Vivado still uses the original BD. BTW, I still see this line in your BD Tcl script
set_property PFM_NAME {xilinx:vck5000:gen4x8_qdma_2:202220.1} [get_files [current_bd_design].bd]
Did you source the script ulp_bd_with_aie_16.tcl
at some point? Can you show me the screenshot of your original BD before you made any modifications (right after you open the Vivado project prj.xpr
) and the screenshot of the modified BD with AIE + data_mover_mm2mm?
I had sourced the wrong .tcl
script once again. After sourcing the one you had attached here the host code runs sucessfully and passes the test. Thank you very much!
Great! Thanks for letting me know.
Hi, I need your design to work with the AIEs using the mlir-aie flow. It seems it all relies on the qdma platform, but currently I don't have access to it and we only have the xdma platform available. I've got through most of the steps on the PL+AIE readme, but I get this error when I run
make rm_project top=data_mover_mm2mm
This seems to be a mismatch on the interconnect resources on the blp due to the different platform. From what I understand we are inserting some presynthesised IP for the qdma platform on the ulp of the xdma platform, hence the error. I guess I could just resynthesise said IP for the qdma platform instead so we can solve this problem with the pins... any pointers on which file I should resynthesise or more generally on how to fix this?