Closed syed-ahmed closed 2 years ago
Hello @syed-ahmed,
I was able to run the KRS examples on KV260 and am currently working on accelerating ORB-SLAM2 ROS node using KRS.
Great to hear that!
I looked into the vivado platform shipped by KRS and it looks like it's a bare minimum acceleration platform. I was wondering if there were any instructions on how that platform was created? I looked into the artifacts of this repo but seems like it only ships with the exported hardware platform (whereas I'm interested in tcl scripts that created the hardware project and petalinux meta recipes). I want to build a pipeline like this using KRS and so was wondering if the platform in this repo need to be updated, such that it supports MIPI/VCU/audio pipelines.
That's correct, KRS alpha
only ships a minimalistic Vitis platform that's then used by the Vitis compiler as a ground base to add whatever accelerators you have in your ROS 2 workspace. KRS alpha
is only meant for basic (single Node) accelerators. Support for multiple accelerators/multiple Nodes is coming up in KRS beta
and with that, also tools for simplified replacement of the Vitis platforms.
Right now, in alpha
, the process is quite cumbersome and I don't recommend it but definitely doable if you know what you're doing. The source code of the platform files and the scripts to automate it are available in:
A few notes:
ament_acceleration
) and build tools (colcon-acceleration
) to produce valid kernels, otherwise, your kernels will synthesize + place&route just fine, but they device tree inconsistencies won't allow them to interact with hardware successfully.My plan is to release at least two Vitis platforms with KRS beta
with tools to switch them easily and document the process on how to contribute your own platform. I'd be great to get your platform landing in here as a third one.
As a side note, I see ORB-SLAM just turned 3! (UZ-SLAMLab/ORB_SLAM3)
Thanks @vmayoral! That explains a lot! Let's keep this issue open and I can document the process as I work on this.
Hey @syed-ahmed!
Do you have any updates to share with us on your research? Let us know how we can help.
Hi @vmayoral . Apologies for the late reply. I was transitioning from academia and had to stop working on this.
I was able to make a custom platform. The process was simpler than I thought. I was able to skip petalinux and reuse the artifacts that were already in kv260 firmware release of KRS. The only thing changed here is the hardware platform. However, a full set of instructions on generating the kv260 artifacts would of course help in the future (e.g. patching with PREEMPT_RT kernel, xilinx BSP modifications if any, added packages etc.). I don't really have a formal writeup, but here's an attempt at documenting the process here:
1) Clone the platform repo
git clone https://github.com/syed-ahmed/xilinx-k26-som-2021.2 \
&& cd xilinx-k26-som-2021.2 \
&& git submodule update --init --recursive
2) Make the platform.
cd kv260-vitis \
&& make platform PFM=kv260_ispMipiRx_vcu_DP
3) Replace the krs kv260 platform with the generated platform from the previous step:
mv ~/krs_ws/src/acceleration/acceleration_firmware_kv260/acceleration_firmware_kv260/firmware/platform ~/krs_ws/src/acceleration/acceleration_firmware_kv260/acceleration_firmware_kv260/firmware/platform_bk \
&& cp -r platforms/xilinx_kv260_ispMipiRx_vcu_DP_202110_1 ~/krs_ws/src/acceleration/acceleration_firmware_kv260/acceleration_firmware_kv260/firmware/platform
4) Generate the device tree from .xsa produced in step 2 by following the directions here.
5) Replace the .dtsi in one of the acceleration examples to test. Example: replace the vadd_faster.dtsi in krs_ws/src/acceleration/acceleration_examples/nodes/faster_doublevadd_publisher/src
with the .dtsi generated from step 4. Mine looks like as follows. Note firmware-name
reflects that of vadd_faster.
/*
* CAUTION: This file is automatically generated by Xilinx.
* Version: XSCT 2021.2
* Today is: Tue Mar 22 00:42:13 2022
*/
/dts-v1/;
/plugin/;
/ {
fragment@0 {
target = <&fpga_full>;
overlay0: __overlay__ {
#address-cells = <2>;
#size-cells = <2>;
firmware-name = "vadd_faster.bit.bin";
resets = <&zynqmp_reset 116>, <&zynqmp_reset 117>, <&zynqmp_reset 118>, <&zynqmp_reset 119>;
};
};
fragment@1 {
target = <&amba>;
overlay1: __overlay__ {
afi0: afi0 {
compatible = "xlnx,afi-fpga";
config-afi = < 0 0>, <1 0>, <2 0>, <3 0>, <4 0>, <5 0>, <6 0>, <7 0>, <8 0>, <9 0>, <10 0>, <11 0>, <12 0>, <13 0>, <14 0x0>, <15 0x000>;
};
clocking0: clocking0 {
#clock-cells = <0>;
assigned-clock-rates = <99999001>;
assigned-clocks = <&zynqmp_clk 71>;
clock-output-names = "fabric_clk";
clocks = <&zynqmp_clk 71>;
compatible = "xlnx,fclk";
};
clocking1: clocking1 {
#clock-cells = <0>;
assigned-clock-rates = <99999001>;
assigned-clocks = <&zynqmp_clk 72>;
clock-output-names = "fabric_clk";
clocks = <&zynqmp_clk 72>;
compatible = "xlnx,fclk";
};
};
};
fragment@2 {
target = <&amba>;
overlay2: __overlay__ {
#address-cells = <2>;
#size-cells = <2>;
audio_ss_0_audio_formatter_0: audio_formatter@80040000 {
clock-names = "s_axi_lite_aclk", "m_axis_mm2s_aclk", "aud_mclk", "s_axis_s2mm_aclk";
clocks = <&misc_clk_0>, <&misc_clk_1>, <&misc_clk_1>, <&misc_clk_0>;
compatible = "xlnx,audio-formatter-1.0", "xlnx,audio-formatter-1.0";
interrupt-names = "irq_mm2s", "irq_s2mm";
interrupt-parent = <&gic>;
interrupts = <0 111 4 0 110 4>;
reg = <0x0 0x80040000 0x0 0x10000>;
xlnx,include-mm2s = <0x1>;
xlnx,include-s2mm = <0x1>;
xlnx,max-num-channels-mm2s = <0x2>;
xlnx,max-num-channels-s2mm = <0x2>;
xlnx,mm2s-addr-width = <0x40>;
xlnx,mm2s-async-clock = <0x1>;
xlnx,mm2s-dataformat = <0x3>;
xlnx,packing-mode-mm2s = <0x0>;
xlnx,packing-mode-s2mm = <0x0>;
xlnx,rx = <&audio_ss_0_i2s_receiver_0>;
xlnx,s2mm-addr-width = <0x40>;
xlnx,s2mm-async-clock = <0x1>;
xlnx,s2mm-dataformat = <0x1>;
xlnx,tx = <&audio_ss_0_i2s_transmitter_0>;
};
misc_clk_0: misc_clk_0 {
#clock-cells = <0>;
clock-frequency = <99999000>;
compatible = "fixed-clock";
};
misc_clk_1: misc_clk_1 {
#clock-cells = <0>;
clock-frequency = <18432995>;
compatible = "fixed-clock";
};
audio_ss_0_i2s_receiver_0: i2s_receiver@80060000 {
aud_mclk = <18432995>;
clock-names = "s_axi_ctrl_aclk", "aud_mclk", "m_axis_aud_aclk";
clocks = <&misc_clk_0>, <&misc_clk_1>, <&misc_clk_0>;
compatible = "xlnx,i2s-receiver-1.0", "xlnx,i2s-receiver-1.0";
interrupt-names = "irq";
interrupt-parent = <&gic>;
interrupts = <0 108 4>;
reg = <0x0 0x80060000 0x0 0x10000>;
xlnx,depth = <0x80>;
xlnx,dwidth = <0x18>;
xlnx,num-channels = <0x1>;
xlnx,snd-pcm = <&audio_ss_0_audio_formatter_0>;
};
audio_ss_0_i2s_transmitter_0: i2s_transmitter@80070000 {
aud_mclk = <18432995>;
clock-names = "s_axi_ctrl_aclk", "aud_mclk", "s_axis_aud_aclk";
clocks = <&misc_clk_0>, <&misc_clk_1>, <&misc_clk_1>;
compatible = "xlnx,i2s-transmitter-1.0", "xlnx,i2s-transmitter-1.0";
interrupt-names = "irq";
interrupt-parent = <&gic>;
interrupts = <0 109 4>;
reg = <0x0 0x80070000 0x0 0x10000>;
xlnx,depth = <0x80>;
xlnx,dwidth = <0x18>;
xlnx,num-channels = <0x1>;
xlnx,snd-pcm = <&audio_ss_0_audio_formatter_0>;
};
axi_iic_0: i2c@80030000 {
#address-cells = <1>;
#size-cells = <0>;
clock-names = "s_axi_aclk";
clocks = <&misc_clk_0>;
compatible = "xlnx,axi-iic-2.1", "xlnx,xps-iic-2.00.a";
interrupt-names = "iic2intc_irpt";
interrupt-parent = <&gic>;
interrupts = <0 107 4>;
reg = <0x0 0x80030000 0x0 0x10000>;
};
axi_vip_0: axi_vip@a0000000 {
/* This is a place holder node for a custom IP, user may need to update the entries */
clock-names = "aclk";
clocks = <&misc_clk_2>;
compatible = "xlnx,axi-vip-1.1";
reg = <0x0 0xa0000000 0x0 0x10000>;
xlnx,axi-addr-width = <0x20>;
xlnx,axi-aruser-width = <0x10>;
xlnx,axi-awuser-width = <0x10>;
xlnx,axi-buser-width = <0x0>;
xlnx,axi-has-aresetn = <0x1>;
xlnx,axi-has-bresp = <0x1>;
xlnx,axi-has-burst = <0x1>;
xlnx,axi-has-cache = <0x1>;
xlnx,axi-has-lock = <0x1>;
xlnx,axi-has-prot = <0x1>;
xlnx,axi-has-qos = <0x1>;
xlnx,axi-has-region = <0x0>;
xlnx,axi-has-rresp = <0x1>;
xlnx,axi-has-wstrb = <0x1>;
xlnx,axi-interface-mode = <0x2>;
xlnx,axi-protocol = <0x0>;
xlnx,axi-rdata-width = <0x20>;
xlnx,axi-rid-width = <0x10>;
xlnx,axi-ruser-width = <0x0>;
xlnx,axi-supports-narrow = <0x1>;
xlnx,axi-wdata-width = <0x20>;
xlnx,axi-wid-width = <0x10>;
xlnx,axi-wuser-width = <0x0>;
};
misc_clk_2: misc_clk_2 {
#clock-cells = <0>;
clock-frequency = <299997000>;
compatible = "fixed-clock";
};
capture_pipeline_mipi_csi2_rx_subsyst_0: mipi_csi2_rx_subsystem@80000000 {
clock-names = "lite_aclk", "dphy_clk_200M", "video_aclk";
clocks = <&misc_clk_0>, <&misc_clk_3>, <&misc_clk_2>;
compatible = "xlnx,mipi-csi2-rx-subsystem-5.1", "xlnx,mipi-csi2-rx-subsystem-5.0";
interrupt-names = "csirxss_csi_irq";
interrupt-parent = <&gic>;
interrupts = <0 104 4>;
reg = <0x0 0x80000000 0x0 0x2000>;
xlnx,axis-tdata-width = <32>;
xlnx,max-lanes = <4>;
xlnx,ppc = <2>;
xlnx,vfb ;
mipi_csi_portscapture_pipeline_mipi_csi2_rx_subsyst_0: ports {
#address-cells = <1>;
#size-cells = <0>;
mipi_csi_port1capture_pipeline_mipi_csi2_rx_subsyst_0: port@1 {
/* Fill cfa-pattern=rggb for raw data types, other fields video-format and video-width user needs to fill */
reg = <1>;
xlnx,cfa-pattern = "rggb";
xlnx,video-format = <12>;
xlnx,video-width = <8>;
mipi_csirx_outcapture_pipeline_mipi_csi2_rx_subsyst_0: endpoint {
remote-endpoint = <&capture_pipeline_v_frmbuf_wr_0capture_pipeline_mipi_csi2_rx_subsyst_0>;
};
};
mipi_csi_port0capture_pipeline_mipi_csi2_rx_subsyst_0: port@0 {
/* Fill cfa-pattern=rggb for raw data types, other fields video-format,video-width user needs to fill */
/* User need to add something like remote-endpoint=<&out> under the node csiss_in:endpoint */
reg = <0>;
xlnx,cfa-pattern = "rggb";
xlnx,video-format = <12>;
xlnx,video-width = <8>;
mipi_csi_incapture_pipeline_mipi_csi2_rx_subsyst_0: endpoint {
data-lanes = <1 2 3 4>;
};
};
};
};
misc_clk_3: misc_clk_3 {
#clock-cells = <0>;
clock-frequency = <199998000>;
compatible = "fixed-clock";
};
capture_pipeline_v_frmbuf_wr_0: v_frmbuf_wr@b0010000 {
#dma-cells = <1>;
clock-names = "ap_clk";
clocks = <&misc_clk_2>;
compatible = "xlnx,v-frmbuf-wr-2.3", "xlnx,axi-frmbuf-wr-v2.2";
interrupt-names = "interrupt";
interrupt-parent = <&gic>;
interrupts = <0 105 4>;
reg = <0x0 0xb0010000 0x0 0x10000>;
reset-gpios = <&gpio 78 1>;
xlnx,dma-addr-width = <32>;
xlnx,dma-align = <16>;
xlnx,max-height = <2160>;
xlnx,max-width = <3840>;
xlnx,pixels-per-clock = <2>;
xlnx,s-axi-ctrl-addr-width = <0x7>;
xlnx,s-axi-ctrl-data-width = <0x20>;
xlnx,vid-formats = "nv12";
xlnx,video-width = <8>;
};
vcu_vcu_0: vcu@80100000 {
#address-cells = <2>;
#clock-cells = <1>;
#size-cells = <2>;
clock-names = "pll_ref", "aclk", "vcu_core_enc", "vcu_core_dec", "vcu_mcu_enc", "vcu_mcu_dec";
clocks = <&misc_clk_4>, <&misc_clk_0>, <&vcu_vcu_0 1>, <&vcu_vcu_0 2>, <&vcu_vcu_0 3>, <&vcu_vcu_0 4>;
compatible = "xlnx,vcu-1.2", "xlnx,vcu";
interrupt-names = "vcu_host_interrupt";
interrupt-parent = <&gic>;
interrupts = <0 106 4>;
ranges ;
reg = <0x0 0x80140000 0x0 0x1000>, <0x0 0x80141000 0x0 0x1000>;
reg-names = "vcu_slcr", "logicore";
reset-gpios = <&gpio 80 0>;
encoder: al5e@80100000 {
compatible = "al,al5e-1.2", "al,al5e";
interrupt-parent = <&gic>;
interrupts = <0 106 4>;
reg = <0x0 0x80100000 0x0 0x10000>;
};
decoder: al5d@80120000 {
compatible = "al,al5d-1.2", "al,al5d";
interrupt-parent = <&gic>;
interrupts = <0 106 4>;
reg = <0x0 0x80120000 0x0 0x10000>;
};
};
misc_clk_4: misc_clk_4 {
#clock-cells = <0>;
clock-frequency = <49999500>;
compatible = "fixed-clock";
};
zyxclmm_drm {
compatible = "xlnx,zocl";
};
vcap_capture_pipeline_mipi_csi2_rx_subsyst_0 {
compatible = "xlnx,video";
dma-names = "port0";
dmas = <&capture_pipeline_v_frmbuf_wr_0 0>;
vcap_portscapture_pipeline_mipi_csi2_rx_subsyst_0: ports {
#address-cells = <1>;
#size-cells = <0>;
vcap_portcapture_pipeline_mipi_csi2_rx_subsyst_0: port@0 {
direction = "input";
reg = <0>;
capture_pipeline_v_frmbuf_wr_0capture_pipeline_mipi_csi2_rx_subsyst_0: endpoint {
remote-endpoint = <&mipi_csirx_outcapture_pipeline_mipi_csi2_rx_subsyst_0>;
};
};
};
};
};
};
};
6) Make changes to kv260.cfg
to reflect new platform requirements (platform name, clks, vivado strategy etc.). Mine looks like this:
platform=kv260_ispMipiRx_vcu_DP
save-temps=1
debug=1
# Enable profiling of data ports
[profile]
data=all:all:all
[vivado]
prop=run.impl_1.strategy=Performance_ExploreWithRemap
7) Compile acceleration examples as in KRS docs.
That's my progress so far. My next step would have been to:
Unfortunately I have to stop here since I don't have the bandwidth to work on this anymore :(. I hope somebody else will pick this up. May be I'll find some time in the future.
Thanks for the update @syed-ahmed, progress looks great to me. I'll keep this open. Keep us posted on your next steps, this should be helpful to others following your path.
I'm closing this for now @syed-ahmed, feel free to re-open or ping me if anything else is needed.
Hi!
I was able to run the KRS examples on KV260 and am currently working on accelerating ORB-SLAM2 ROS node using KRS. I looked into the vivado platform shipped by KRS and it looks like it's a bare minimum acceleration platform. I was wondering if there were any instructions on how that platform was created? I looked into the artifacts of this repo but seems like it only ships with the exported hardware platform (whereas I'm interested in tcl scripts that created the hardware project and petalinux meta recipes). I want to build a pipeline like this using KRS and so was wondering if the platform in this repo need to be updated, such that it supports MIPI/VCU/audio pipelines.