nvdla / hw

RTL, Cmodel, and testbench for NVDLA
Other
1.68k stars 561 forks source link

Synchronous External Abort #320

Closed MaxS1996 closed 4 years ago

MaxS1996 commented 4 years ago

I am currently trying to get the nv_small specification of the current master branch to run on an Xilinx ZYNQ MPSoC ZCU104 FPGA

I generated the NVDLA core and APB2CSB blocks from the source, exported them as IP, wrapped them for AXI and put them into a top level design, bundled everything together in one petalinux image (2018.2 for all xilinx tools), compiled KMD and UMD, but when I try to run one of the regression tests, it will always get stuck on "Synchronous External Abort" right away

I suspect that there is something wrong with my block design, but I am not skilled enough to find the mistake

error message: root@xilinx-zcu104-2018_2:/UMD# ./nvdla_runtime --loadable /regression/flatbufs/kmd/NN/NN_L0_1_small_fbuf creating new runtime context... Emulator starting submitting tasks... [ 175.011776] Enter:dla_read_network_config [ 175.017397] Exit:dla_read_network_config status=0 [ 175.022079] Enter: dla_initiate_processors [ 175.026159] Enter: dla_submit_operation [ 175.029980] Prepare Convolution operation index 0 ROI 0 dep_count 1 [ 175.036226] Enter: dla_prepare_operation [ 175.040135] Synchronous External Abort: synchronous external abort (0x96000210) at 0xffffff800c9a7004 [ 175.049337] Internal error: : 96000210 [#1] SMP [ 175.053840] Modules linked in: opendla(O) mali(O) uio_pdrv_genirq [ 175.059919] CPU: 3 PID: 2266 Comm: nvdla_runtime Tainted: G O 4.14.0-xilinx-v2018.2 #1 [ 175.068942] Hardware name: ZynqMP ZCU104 RevC (DT) [ 175.073717] task: ffffffc02e3c8280 task.stack: ffffff800cae0000 [ 175.079637] PC is at dla_reg_read+0xc/0x20 [opendla] [ 175.084580] LR is at reg_read+0x18/0x28 [opendla] [ 175.089254] pc : [] lr : [] pstate: 00000145 [ 175.096631] sp : ffffff800cae3ae0 [ 175.099930] x29: ffffff800cae3ae0 x28: ffffff8000aa5008 [ 175.105225] x27: 0000000000000000 x26: ffffff8000aa65c0 [ 175.110519] x25: 0000000000000000 x24: ffffff8000aa2998 [ 175.115814] x23: ffffff8000aa5e14 x22: ffffff800cae3bae [ 175.121109] x21: ffffff800cae3baf x20: ffffff8000aa2168 [ 175.126404] x19: ffffff8000aa5150 x18: 0000000000000010 [ 175.131699] x17: 0000007fb1a93e90 x16: ffffff80081aeea8 [ 175.136993] x15: ffffffffffffffff x14: ffffff8088e8a227 [ 175.142288] x13: ffffff8008de8aa8 x12: ffffff80084fed00 [ 175.147583] x11: 0000000005f5e0ff x10: 0000000000000004 [ 175.152878] x9 : 00000000ffffffd0 x8 : 6e6f697461726570 [ 175.158173] x7 : 6f5f657261706572 x6 : 000000000000017e [ 175.163467] x5 : 0000000000000000 x4 : 0000000000000000 [ 175.168762] x3 : 0000000000000000 x2 : ffffff8000aa5008 [ 175.174057] x1 : 0000000000007004 x0 : ffffff800c9a7004 [ 175.179353] Process nvdla_runtime (pid: 2266, stack limit = 0xffffff800cae0000) [ 175.186643] Call trace: [ 175.189075] Exception stack(0xffffff800cae39a0 to 0xffffff800cae3ae0) [ 175.195499] 39a0: ffffff800c9a7004 0000000000007004 ffffff8000aa5008 0000000000000000 [ 175.203311] 39c0: 0000000000000000 0000000000000000 000000000000017e 6f5f657261706572 [ 175.211123] 39e0: 6e6f697461726570 00000000ffffffd0 0000000000000004 0000000005f5e0ff [ 175.218935] 3a00: ffffff80084fed00 ffffff8008de8aa8 ffffff8088e8a227 ffffffffffffffff [ 175.226747] 3a20: ffffff80081aeea8 0000007fb1a93e90 0000000000000010 ffffff8000aa5150 [ 175.234559] 3a40: ffffff8000aa2168 ffffff800cae3baf ffffff800cae3bae ffffff8000aa5e14 [ 175.242371] 3a60: ffffff8000aa2998 0000000000000000 ffffff8000aa65c0 0000000000000000 [ 175.250183] 3a80: ffffff8000aa5008 ffffff800cae3ae0 ffffff8000aa0060 ffffff800cae3ae0 [ 175.257995] 3aa0: ffffff8000aa0624 0000000000000145 ffffff800cae3b30 ffffff800cae3b30 [ 175.265807] 3ac0: 0000008000000000 00000000ffffffc8 ffffff800cae3ae0 ffffff8000aa0624 [ 175.273631] [] dla_reg_read+0xc/0x20 [opendla] [ 175.279619] [] utils_get_free_group+0x98/0xe0 [opendla] [ 175.286390] [] dla_submit_operation+0x74/0x320 [opendla] [ 175.293247] [] dla_execute_task+0x174/0x570 [opendla] [ 175.299844] [] nvdla_task_submit+0x28/0xb8 [opendla] [ 175.306354] [] nvdla_submit+0xe8/0x188 [opendla] [ 175.312511] [] drm_ioctl_kernel+0x6c/0xf0 [ 175.318060] [] drm_ioctl+0x180/0x3b8 [ 175.323183] [] do_vfs_ioctl+0xa4/0x7d8 [ 175.328476] [] SyS_ioctl+0x44/0x80 [ 175.333423] Exception stack(0xffffff800cae3ec0 to 0xffffff800cae4000) [ 175.339847] 3ec0: 0000000000000003 00000000c0106440 0000007fed45c9b0 0000007fed45cdf8 [ 175.347659] 3ee0: 0000007fed45c990 0000007fed45cdf8 0000007fed444cf0 000000003968cd60 [ 175.355471] 3f00: 000000000000001d 0000000000018010 0000000000000000 0000000000000000 [ 175.363283] 3f20: 7420676e69747469 0a2e2e2e736b7361 0000007fb1db9f00 0000007fb1db5610 [ 175.371095] 3f40: 0000007fb1df8438 0000007fb1a93e90 00000000000001a5 000000003968c2e0 [ 175.378907] 3f60: 0000000000000000 0000007fed45ca80 00000000396ab330 0000000000000000 [ 175.386719] 3f80: 000000003968c3d0 000000003968c3e8 0000000000000000 000000003968cd60 [ 175.394531] 3fa0: 00000000396ab330 0000007fed45c9a0 0000007fb1dcbb54 0000007fed444970 [ 175.402343] 3fc0: 0000007fb1a93e9c 0000000060000000 0000000000000003 000000000000001d [ 175.410155] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 175.417969] [] el0_svc_naked+0x24/0x28 [ 175.423263] Code: d65f03c0 b40000c0 f9400400 8b214000 (b9400000) [ 175.429338] ---[ end trace f8e4886bc0cb50f8 ]---

AXI Wrapper made from IP blocks of NVDLA core and nv_nvdla_apb2csb Wrapper

Top-Level Design Top

MaxS1996 commented 4 years ago

Inserting opendla.ko results in this output:

root@xilinx-zcu104-2018_2:/# insmod KMD/opendla.ko [ 39.142662] Probe NVDLA config nvidia,nv_small [ 39.147431] 0 . 12 . 5 [ 39.149725] reset engine done [ 39.153040] [drm] Initialized nvdla 0.0.0 20171017 for a0000000.nvdla_wrapper on minor 1 xf86: found device 1

which seems correct...

soulzatzero commented 2 years ago

Hello! Been running into a similar issue when testing for a Intel NIC on the pcie root port. Could you please share what your solution to the problem is? Thanks!

MaxS1996 commented 2 years ago

I cannot remember as it has been 2 years already. But there is an issue in this repository that will probably give you a lot of guidance to deploying NVDLA to an FPGA.

zld932 commented 2 years ago

Hello, I have the same problem when I run nv_small on zcu104. Could you please share what your solution to this problems? Thanks

AndiNNT commented 5 months ago

Hello i have same problem run on zcu102, If i run only the UMD, and i comment the ioctl on the NvDlaSubmit function, the rest of the program works properly. so i believe is connected to setting properly the shared memory on the device tree, through the user.dtsi file... Anyone have fix it?