nqdtan / vck5000_vivado_ulp

An alternative Vivado custom design example (to fully Vitis) for the User Logic Partition targeting VCK5000
7 stars 4 forks source link

Building the design with xdma platform #3

Closed gabrielrodcanal closed 11 months ago

gabrielrodcanal commented 11 months ago

Hi, I need your design to work with the AIEs using the mlir-aie flow. It seems it all relies on the qdma platform, but currently I don't have access to it and we only have the xdma platform available. I've got through most of the steps on the PL+AIE readme, but I get this error when I run make rm_project top=data_mover_mm2mm

Screenshot from 2023-10-19 13-33-59

This seems to be a mismatch on the interconnect resources on the blp due to the different platform. From what I understand we are inserting some presynthesised IP for the qdma platform on the ulp of the xdma platform, hence the error. I guess I could just resynthesise said IP for the qdma platform instead so we can solve this problem with the pins... any pointers on which file I should resynthesise or more generally on how to fix this?

nqdtan commented 11 months ago

Hi @gabrielrodcanal,

Which shell version are you using: xilinx_vck5000_gen3x16_xdma_1_202120_1 or xilinx_vck5000_gen4x8_xdma_2_202210_1? For the former, you'll need to check out branch 2021.2 of this repo (and it only works with Vivado 2021.2). I do not have support for the latter unfortunately..

It's also worth mentioning that there's a mlir-air project which can support running mlir-aie on VCK5000 as well. But as far as I know, they build their own custom shell.

gabrielrodcanal commented 11 months ago

Hi @nqdtan, It's actually xilinx_vck5000_gen4x8_xdma_2_202210. Any chance this is supported?

Would you recommend going for mlir-air with their custom shell or using yours instead? Correct me if I'm wrong but my understanding is the mlir-air stuff will only allow to program the AIEs with the AIR dialect, whereas your shell will enable the AIE dialect, which seems to have more support.

nqdtan commented 11 months ago

xilinx_vck5000_gen4x8_xdma_2_202210 is doable. You will need to run a sample Vitis flow for that shell and then extract relevant TCL files for the Vivado hardware design. I can provide more detail/help if you want to follow this path. The goal is creating the block design ulp_bd_with_aie_16 which interfaces with the given base platform. It might be possible to source this script as a TCL hook in Vitis now that I think about it so that you don't need to create a Vivado project.

I think it depends on whichever option is easier to adopt in your situation. If the end goal is running mlir-aie / AIE dialect on VCK5000, both should work. If you look around the test files under mlir-air/test, you will see that there are some that use AIE dialect as well. This project doesn't have more support for the AIE Dialect than mlir-air.

One of the advantages of using the official shell from the AMD Xilinx member site is that the fan noise has been moderated quite a bit in the recent base platform release (qdma 2022.2) -- which is a concern to me as the machine that I have the card installed is on a shared office floor with others.

gabrielrodcanal commented 11 months ago

I think your approach will work for me best, as I cannot really change the platform, since this VCK5000 is shared with other users that use it just for kernel acceleration and they've already been working on the xdma shell. I am assuming that your approach doesn't really alter the blp whatsoever and all the necessary logic for interfacing with the AIEs through PCIe is included in the ulp, so that any bitstream previously generated for the xilinx_vck5000_gen4x8_xdma_2_202210 could still be loaded in the VCK5000, as we haven't changed the shell. Am I right?

I guess by running the Vitis flow you're referring to generating a hardware platform out of the xdma shell. If you could give me some basic pointers on how to get to create that block design that'd be very helpful. I only got as far as generating a platform project with the xdma shell .xsa. The platform.tcl file generated is almost empty though.

nqdtan commented 11 months ago

Yes, running a bitstream generated by this project won't alter anything in the shell (the base platform -- blp), so you could still load any previous bitstream files.

By Vitis flow, I was referring to generating a bitstream that contains the block design of this repo (i.e., ulp_bd_with_aie.tcl) in Vitis. It is not straightfoward to do so, since it is unclear to me how to specify the NOC connectivity in Vitis, especially those that concern with the AI engine IP block. That's why I created this project in the first place -- to let someone modify the Block Design in Vivado GUI and then just do bitstream generation as in normal Vivado flow. The other problem is that, when going from one version to another version (e.g., 2021.1 --> 2022.2), some IPs, port interfaces, or even the top-level module got renamed, so we'll have to update the scripts accordingly -- as you have observed when trying to run the 2022.2 script with the 2022.1 xdma shell. I currently don't have access to Vivado/Vitis 2022.1 to recreate this flow in the older version. But I can show you an alternative approach to get this working. I have tested this with 2022.2; you probably will need to adjust certain things for 2022.1

======

  1. Generate the IP kernel_pack_data_mover_mm2mm as shown in the README.
  2. Launch a "bare" Vitis project without any design

v++ -l --platform xilinx_vck5000_gen4x8_qdma_2_202220_1 --temp_dir tmp

  1. Abort the run at step Run vpl: Step synth: Started

  2. Open the Vivado project created by Vitis

    cd tmp/link/vivado/vpl/prj
    vivado prj.xpr
  3. Open the block design named ulp_inst_0 (name might be different)

  4. Source the attached script. The script may need to be updated accordingly in 2022.1 and based on your system; what I would do is stepping through one command at a time to easily track what's going on when there is an error.

source ulp_bd_aie.tcl

  1. Run bitstream generation

launch_runs impl_1 -to_step write_device_image -jobs 16

  1. If this runs successfully, it will produce a *.pdi file under prj.runs/impl_1/. You can follow the steps in the README for how to obtain a xclbin file from *.pdi.

======

Regarding the Block design with AIE, I will explain a bit what goes in there (derived from the original VCK190 block design from the mlir-aie project)

This is the screenshot of the original bare ULP

v2022_2_original_bd

This is the screenshot of the ulp_bd_aie

v2022_2_bd_with_aie

The changes are highlighted as follows

v2022_2_bd_with_aie_highlight

Briefly, this design uses data_mover_mm2mm HLS IP to send or receive (configuration) data from the host to/from the AIE (through the axi_noc_aie_prog to the AI Engine IP) and the DDR (through the axi_noc_kernel0 to BLP_M_M00_INI_0). There is also a small BRAM (emb_mem_gen_0) accessed by both the HLS IP and the AI Engine -- basically enabling the AI Engine to read/write PL BRAMs (some tests in mlir-aie require that functionality).

ulp_bd_aie.zip

gabrielrodcanal commented 11 months ago

Thank you so much for your detailed instructions @nqdtan, this is very appreciated. I managed to generate the .xclbin, but I get error -22 from XRT and when checking dmesg I found out that there is a mismatch between the UUID of the shell and that of the bitstream. In fact, the bitstreams's UUID is 00000000-0000-0000-0000-000000000000 (xclbinutil --input ulp.xclbin --info). Did something go wrong in the process?

Edit: I stand corrected, it is shell that is reported to have that UUID. Here the contents of ulp.xclbin.info.

==============================================================================
XRT Build Version: 2.13.478 (2022.1)
       Build Date: 2022-05-16 22:30:16
          Hash ID: 458699e9617da693e354d95b637df38daa2ed40a
==============================================================================
The BUILD_METADATA section is not present. Reports will be limited.
==============================================================================
xclbin Information
------------------
   Generated by:           <unknown>
   Version:                2.13.478
   Kernels:                <unknown>
   Signature:    
   Content:    
   UUID (xclbin):          e253e5f9-490d-6186-1a07-82f8aff27691
   UUID (IINTF):           eaae3fb8b262b65b21fec0676792ebfc
   Sections:               BITSTREAM_PARTIAL_PDI, IP_LAYOUT, MEM_TOPOLOGY, 
                           PARTITION_METADATA, CLOCK_FREQ_TOPOLOGY, 
                           EMBEDDED_METADATA, GROUP_TOPOLOGY
==============================================================================
Hardware Platform (Shell) Information
-------------------------------------
   Platform VBNV:          xilinx_vck5000_gen4x8_xdma_2_202210_1
   Static UUID:            00000000-0000-0000-0000-000000000000
   Feature ROM TimeStamp:  0

Scalable Clocks
------
   Name:      KERNEL_CLK
   Index:     0   
   Type:      KERNEL
   Frequency: 100 MHz 

   Name:      DATA_CLK
   Index:     1   
   Type:      DATA
   Frequency: 300 MHz 

System Clocks
------
   No system clock data available.

Memory Configuration
--------------------
   Name:         MC_NOC0
   Index:        0
   Type:         MEM_DDR4
   Base Address: 0xc100000000
   Address Size: 0x300000000
   Bank Used:    Yes 
==============================================================================
User Added Key Value Pairs
--------------------------
   <empty>
==============================================================================
nqdtan commented 11 months ago

I also get all 0s for the hardware platform Static UUID, and my xclbin files work fine, so it seems that it is not an issue.

Can you send me the full dmesg log? It might also help to cold reboot the machine and try loading the bitstream again.

nqdtan commented 11 months ago

I notice that the interface UUID (IINTF) is actually different in each shell version:

2021.2: e221a0ff8695d5eb8725fc65147f90c3 2022.2: eaae3fb8b262b65b21fec0676792ebfc

So you need to use the correct INTF UUID in 2022.1. If you have a different xclbin file generated by normal Vitis flow, you can extract that information from it. For example, this command

xclbinutil --dump-section PARTITION_METADATA:JSON:partition_metadata_2022_1.json --input normal_vitis_flow_2022_1.xclbin

will generate partition_metadata_2022_1.json with the correct interface UUID. Another way is to look at the bitstream content in *xclbin.info file. Then you can use it for your ULP AIE bitstream generation (in place of xclbin_generator/partition_metadata.json of this repo).

gabrielrodcanal commented 11 months ago

I also get all 0s for the hardware platform Static UUID, and my xclbin files work fine, so it seems that it is not an issue.

Can you send me the full dmesg log? It might also help to cold reboot the machine and try loading the bitstream again. dmesg.log

Here's the dmesg log.

gabrielrodcanal commented 11 months ago

I notice that the interface UUID (IINTF) is actually different in each shell version:

2021.2: e221a0ff8695d5eb8725fc65147f90c3 2022.2: eaae3fb8b262b65b21fec0676792ebfc

So you need to use the correct INTF UUID in 2022.1. If you have a different xclbin file generated by normal Vitis flow, you can extract that information from it. For example, this command

xclbinutil --dump-section PARTITION_METADATA:JSON:partition_metadata_2022_1.json --input normal_vitis_flow_2022_1.xclbin

will generate partition_metadata_2022_1.json with the correct interface UUID. Another way is to look at the bitstream content in *xclbin.info file. Then you can use it for your ULP AIE bitstream generation (in place of xclbin_generator/partition_metadata.json of this repo).

I'm going to build a bitstream with this platform and I will let you know if what you say here works. Thanks!

gabrielrodcanal commented 11 months ago

I notice that the interface UUID (IINTF) is actually different in each shell version:

2021.2: e221a0ff8695d5eb8725fc65147f90c3 2022.2: eaae3fb8b262b65b21fec0676792ebfc

So you need to use the correct INTF UUID in 2022.1. If you have a different xclbin file generated by normal Vitis flow, you can extract that information from it. For example, this command

xclbinutil --dump-section PARTITION_METADATA:JSON:partition_metadata_2022_1.json --input normal_vitis_flow_2022_1.xclbin

will generate partition_metadata_2022_1.json with the correct interface UUID. Another way is to look at the bitstream content in *xclbin.info file. Then you can use it for your ULP AIE bitstream generation (in place of xclbin_generator/partition_metadata.json of this repo).

This effectively solved the issue, I'm able to load the bitstream now, thank you. However, the program doesn't run correctly. I get the following output (see attached file). Only the first 100 lines are shown, since it gets stuck in an infinite loop printing similar lines.

host_with_aie.log

nqdtan commented 11 months ago

Can you try recompiling libxaie with the following commands

cd embeddedsw/XilinxProcessorIPLib/drivers/aienginev2/src
make clean
make -f Makefile.Linux CFLAGS+=-DPL
# Or run the compile.sh script

This would enable the actual call to PL_Write32 to send configuration data to AIE instead of printing it out. Refer to embeddedsw/XilinxProcessorIPLib/drivers/aienginev2/src/io_backend/ext/xaie_debug.c as shown in the following screenshot:

Screenshot from 2023-10-31 10-34-15

I will update the instruction in the README on this. Let me know how it goes.

gabrielrodcanal commented 11 months ago

All those messages disappear after recompiling libxaie in PL mode. However, the host code gets stuck on Configuring AIE...

nqdtan commented 11 months ago

Ok.. Perhaps the Vivado design is the problem. Could you check to make sure that your block design address editor looks similar to this:

Screenshot from 2023-10-31 10-58-17

In particular, make sure that aiengine is registered as a slave device of data_mover_mm2mm at the address 0x200_0000_0000, so that data_mover_mm2mm can initiate the write request to aiengine at that address as we specify in the host code.

It might also help to track where things actually stuck. I would add some debug printouts in PL_Write32 in src/host_sw_with_aie/common.c to see whether it stucks at the first PL_Write32 call, or after several calls.

gabrielrodcanal commented 11 months ago

The address editor looks like that screenshot you have shared. I have found that the host code gets stuck in this line https://github.com/nqdtan/vck5000_vivado_ulp/blob/a4d4ac6be53d49b5e5dadaff2878eabd169ba6ad/host_sw_with_aie/host.cpp#L172

There are PL_Write32 calls issued up to https://github.com/nqdtan/vck5000_vivado_ulp/blob/a4d4ac6be53d49b5e5dadaff2878eabd169ba6ad/host_sw_with_aie/host.cpp#L168

nqdtan commented 11 months ago

If you comment out line 172, will it give correct result?

gabrielrodcanal commented 11 months ago

No. image

nqdtan commented 11 months ago

Looks like there is some issue with the AIE accessing the PL BRAM. Here is the address editor showing their connections

Screenshot from 2023-10-31 11-32-27

Can you run this command in Vivado

write_bd_design ulp_aie.tcl

and share with me ulp_aie.tcl?

nqdtan commented 11 months ago

We also need to make sure that the AIE did get some meaningful configuration data (not all 0s). Can you replace line 180 of host.cpp with this

data_mover_mm_read(dev_mm1, host_mm2, XAIE_BASE_ADDR + 0x0000020000 + _XAie_GetTileAddr(&DevInst, 1, 18), len, print_result);

Here, I added the offset 0x0000020000 to print out the program memory of tile AIE(18, 1) instead of data memory. This helps to debug whether the AIE receives the config data correctly. You can also print out the DMA, Stream switch configs as well.

https://docs.xilinx.com/r/en-US/am015-versal-aie-register-reference/Program_Memory-aie_core_module-Register

gabrielrodcanal commented 11 months ago

Looks like there is some issue with the AIE accessing the PL BRAM. Here is the address editor showing their connections

Screenshot from 2023-10-31 11-32-27

Can you run this command in Vivado

write_bd_design ulp_aie.tcl

and share with me ulp_aie.tcl?

I generated the tcl file for the block design with write_bd_tcl ulp_aie.tcl.

ulp_aie.tcl.tar.gz

gabrielrodcanal commented 11 months ago

We also need to make sure that the AIE did get some meaningful configuration data (not all 0s). Can you replace line 180 of host.cpp with this

data_mover_mm_read(dev_mm1, host_mm2, XAIE_BASE_ADDR + 0x0000020000 + _XAie_GetTileAddr(&DevInst, 1, 18), len, print_result);

Here, I added the offset 0x0000020000 to print out the program memory of tile AIE(18, 1) instead of data memory. This helps to debug whether the AIE receives the config data correctly. You can also print out the DMA, Stream switch configs as well.

https://docs.xilinx.com/r/en-US/am015-versal-aie-register-reference/Program_Memory-aie_core_module-Register

Okay, that line printed all 0s, actually.

nqdtan commented 11 months ago

Thanks for checking. I just wanted to check with you another thing: if you enable line 14 and line 30 in common.c to print out the config info, are you able to see something similar to this log file attached (instead of all 0s)? These are all the configs up to line 173.

XAIE_BASE_ADDR + 0x0000020000 + _XAie_GetTileAddr(&DevInst, 1, 18) equals to 0x20009060000. It should return 0x4e65020b when you read that memory location.

log.txt

gabrielrodcanal commented 11 months ago

Thanks for checking. I just wanted to check with you another thing: if you enable line 14 and line 30 in common.c to print out the config info, are you able to see something similar to this log file attached (instead of all 0s)? These are all the configs up to line 173.

XAIE_BASE_ADDR + 0x0000020000 + _XAie_GetTileAddr(&DevInst, 1, 18) equals to 0x20009060000. It should return 0x4e65020b when you read that memory location.

log.txt

How weird. When I replace line 180 by what you sent me here I get what I show on the screenshot.

We also need to make sure that the AIE did get some meaningful configuration data (not all 0s). Can you replace line 180 of host.cpp with this

data_mover_mm_read(dev_mm1, host_mm2, XAIE_BASE_ADDR + 0x0000020000 + _XAie_GetTileAddr(&DevInst, 1, 18), len, print_result);

Here, I added the offset 0x0000020000 to print out the program memory of tile AIE(18, 1) instead of data memory. This helps to debug whether the AIE receives the config data correctly. You can also print out the DMA, Stream switch configs as well. https://docs.xilinx.com/r/en-US/am015-versal-aie-register-reference/Program_Memory-aie_core_module-Register

Okay, that line printed all 0s, actually.

image

When I enable lines 14 and 30 in common.cpp I see something like this, very similar to your log (notice I edited this comment, I thought they were different at first):

image

nqdtan commented 11 months ago

Okay. That's normal. This is just to make sure we are not sending all 0s to the AIE. The config data are derived from the XAie calls and the elf file for the core (18, 1). But that does not mean the AIE receives them correctly.

I'm curious about how you built the Vivado design. In the BD script that you have shared with me, I noticed this line

set_property PFM_NAME {xilinx:vck5000:gen4x8_qdma_2:202220.1} [get_files [current_bd_design].bd]

I thought you are using 2022.1 xdma shell, so this appears strange to me. You won't be able to use the script ulp_bd_with_aie.tcl here to build a working bitstream for shell 2022.1, since the ULP interface might differ from 2022.2.

gabrielrodcanal commented 11 months ago

I followed the steps that you gave me:

  1. Generate the IP kernel_pack_data_mover_mm2mm as shown in the README.

  2. Launch a "bare" Vitis project without any design v++ -l --platform xilinx_vck5000_gen4x8_qdma_2_202220_1 --temp_dir tmp

  3. Abort the run at step Run vpl: Step synth: Started

  4. Open the Vivado project created by Vitis

    cd tmp/link/vivado/vpl/prj
    vivado prj.xpr
  5. Open the block design named ulp_inst_0 (name might be different)

  6. Source the attached script. The script may need to be updated accordingly in 2022.1 and based on your system; what I would do is stepping through one command at a time to easily track what's going on when there is an error.

source ulp_bd_aie.tcl

Once I got here I got the error: "Design ulp_int_0 already exists in your project, please set the variable <design_name> to another value."

To fix it I changed the name of the design here https://github.com/nqdtan/vck5000_vivado_ulp/blob/a4d4ac6be53d49b5e5dadaff2878eabd169ba6ad/ulp_bd_with_aie_16.tcl#L57

to ulp_inst_with_aie_0. After that I could follow the rest of the steps you gave me normally. I found it strange that Vivado was happy with that property, however.

nqdtan commented 11 months ago

I think you might have sourced the wrong script. Make sure that you really source the (new) script attached in that post (ulp_bd_aie.tcl), not of this repo (ulp_bd_with_aie_16.tcl). In that attached (new) script, I don't use ulp_inst_0 (*), so it should not cause the error as you had seen there.

In addition to that, you also cannot rename the BD, because the later step (link_design -- just before the Implementation) looks for the exact ulp_inst_0 module synthesized in the Synthesis step to combine with the precompiled shell design checkpoint (DCP). Basically, ulp_inst_0 is a reconfigurable blackbox module in the shell DCP, and it is to be synthesized separately and later recombined. I wasn't sure why Vivado was able to proceed normally in your case; I think what you did is essentially just create a different BD, and Vivado ignored it and used the original BD.

If you implement everything correctly, the Implementation netlist should look similar to this. Notice how some PL netlist gravitates towards the AIE

Screenshot from 2023-10-31 13-11-37

And the NOC Placement and Routing should look like this

Screenshot from 2023-10-31 13-12-37

(*) Edit: meant to say I don't explicitly refer to "ulp_inst_0" in the script.

nqdtan commented 11 months ago

Also, I would like to add one more thing:

In Step4, once you open the Vivado project, make sure that you cancel the current synthesis run to effectively stop Vivado from synthesizing the original BD (hit the cancel button on the top right corner).

gabrielrodcanal commented 11 months ago

Hi, I managed to reproduce all the steps with the script you attached in your previous comment. I think the NoC is not properly connected. It looks very different from your picture.

image

The implemented design however looks similar to yours: image

Any idea what might be going on?

nqdtan commented 11 months ago

Can you send me your top_wrapper_routed.dcp + the BD script (generated by write_bd_tcl)? I'd to like inspect the DCP if it actually contains the data_mover_mm2mm and BRAM cells.

gabrielrodcanal commented 11 months ago

Can you send me your top_wrapper_routed.dcp + the BD script (generated by write_bd_tcl)? I'd to like inspect the DCP if it actually contains the data_mover_mm2mm and BRAM cells.

Yes, find the files in the following link: https://uoe-my.sharepoint.com/:u:/g/personal/s2081362_ed_ac_uk/Edo4feDV3oVEjiRBzIsDnxYBPNetg9ds8aJEc7Ly8TrqHw?e=rgAMxZ

nqdtan commented 11 months ago

There is no data_mover_mm2mm cell in your final implementation.

Screenshot from 2023-11-07 09-52-06

The NoC solution window indicates that there are no AIE NMUs being used.

Screenshot from 2023-11-07 09-52-15

I guess Vivado still uses the original BD. BTW, I still see this line in your BD Tcl script

set_property PFM_NAME {xilinx:vck5000:gen4x8_qdma_2:202220.1} [get_files [current_bd_design].bd]

Did you source the script ulp_bd_with_aie_16.tcl at some point? Can you show me the screenshot of your original BD before you made any modifications (right after you open the Vivado project prj.xpr) and the screenshot of the modified BD with AIE + data_mover_mm2mm?

gabrielrodcanal commented 11 months ago

I had sourced the wrong .tcl script once again. After sourcing the one you had attached here the host code runs sucessfully and passes the test. Thank you very much!

nqdtan commented 11 months ago

Great! Thanks for letting me know.