nvdla / hw

RTL, Cmodel, and testbench for NVDLA
Other
1.68k stars 561 forks source link

The Convolution Regression tests in flatbufs is stalling when running on ZCU104 FPGA #289

Open shenbakrish opened 5 years ago

shenbakrish commented 5 years ago

Hello all,

I have been working on integrating NVDLA into ZCU104 FPGA. Unfortunately, when i run regression tests for NV_small the process stalls completely throws the following error:

Enable Convolution operation index 0 ROI 0 INFO: rcu_sched self-detected stall on CPU 0-...: (5243 ticks this GP) idle=052/140000000000001/0 softirq=2225/2225 (t=5250 jiffies g=266 c=265 q=68) Task dump for CPU 0: nvdla_runtime R running task 0 2192 2165 0x00000002 Call trace: [] dump_backtrace+0x0/0x368 [] show_stack+0x14/0x20 [] sched_show_task+0x140/0x170 [] dump_cpu_task+0x40/0x50 [] rcu_dump_cpu_stacks+0x94/0xd4 [] rcu_check_callbacks+0x594/0x7d0 [] update_process_times+0x2c/0x58 [] tick_sched_handle.isra.5+0x30/0x50 [] tick_sched_timer+0x40/0x90 [] __hrtimer_run_queues+0xec/0x168 [] hrtimer_interrupt+0xa0/0x220 [] arch_timer_handler_phys+0x28/0x48 [] handle_percpu_devid_irq+0x80/0x138 [] generic_handle_irq+0x24/0x38 [] __handle_domain_irq+0x5c/0xb8 [] gic_handle_irq+0x68/0xc0 Exception stack(0xffffff800c0738c0 to 0xffffff800c073a00)

I have reserved memory of 1GB, followed all the global defnitions in the hardware (Vivado 2018.3) and appropriate changes in the device driver modules and device tree in Petalinux (2018.3).

The following is my device tree settings /include/ "system-conf.dtsi" / {
reserved-memory {

address-cells = <2>;

        #size-cells = <2>;
        ranges;
       nvdla_reserved: buffer@0 {
                  no-map;
                  reg = <0x0 0x40000000 0x0 0x40000000>;
        };
};

design_1_wrapper_0 { compatible = "nvidia,nvdla_2"; memory-region = <&nvdla_reserved>; }; };

Any suggestions from other peers would be quite helpful. Thanks a lot in advance.

Regards, Shenba.

gitosu67 commented 4 years ago

Hey were you able to solve this?

sunny-yellow commented 2 years ago

Hello all,

I have been working on integrating NVDLA into ZCU104 FPGA. Unfortunately, when i run regression tests for NV_small the process stalls completely throws the following error:

Enable Convolution operation index 0 ROI 0 INFO: rcu_sched self-detected stall on CPU 0-...: (5243 ticks this GP) idle=052/140000000000001/0 softirq=2225/2225 (t=5250 jiffies g=266 c=265 q=68) Task dump for CPU 0: nvdla_runtime R running task 0 2192 2165 0x00000002 Call trace: [] dump_backtrace+0x0/0x368 [] show_stack+0x14/0x20 [] sched_show_task+0x140/0x170 [] dump_cpu_task+0x40/0x50 [] rcu_dump_cpu_stacks+0x94/0xd4 [] rcu_check_callbacks+0x594/0x7d0 [] update_process_times+0x2c/0x58 [] tick_sched_handle.isra.5+0x30/0x50 [] tick_sched_timer+0x40/0x90 [] __hrtimer_run_queues+0xec/0x168 [] hrtimer_interrupt+0xa0/0x220 [] arch_timer_handler_phys+0x28/0x48 [] handle_percpu_devid_irq+0x80/0x138 [] generic_handle_irq+0x24/0x38 [] __handle_domain_irq+0x5c/0xb8 [] gic_handle_irq+0x68/0xc0 Exception stack(0xffffff800c0738c0 to 0xffffff800c073a00)

I have reserved memory of 1GB, followed all the global defnitions in the hardware (Vivado 2018.3) and appropriate changes in the device driver modules and device tree in Petalinux (2018.3).

The following is my device tree settings /include/ "system-conf.dtsi" / { reserved-memory {

address-cells = <2>;

size-cells = <2>;

ranges; nvdla_reserved: buffer@0 { no-map; reg = <0x0 0x40000000 0x0 0x40000000>; }; }; design_1_wrapper_0 { compatible = "nvidia,nvdla_2"; memory-region = <&nvdla_reserved>; }; };

Any suggestions from other peers would be quite helpful. Thanks a lot in advance.

Regards, Shenba.

hello, when do you run flatbuffs on zcu104, did you meet this problem? It cannot allocate memory.

root@nvdla_p5:/media/card# ls kmd lost+found test.sh testt.sh umd uumd root@nvdla_p5:/media/card# dmesg -n 1 root@nvdla_p5:/media/card# su root ./testt.sh == Tests for nv_small == ./testt.sh: line 9: echt: command not found ./testt.sh: line 13: :wq: command not found ./testt.sh: line 14: ========================: command not found = Run PDP/PDP_L0_0_small_fbuf creating new runtime context... Emulator starting submitting tasks... NvDlaSubmit: Error IOCTL failed (No such process) (DLA_RUNTIME) Error 0x0003000f: (propagating from Runtime.cpp, function submitInternal(), line 666) (DLA_TEST) Error 0x00000004: runtime->submit() failed (in RuntimeTest.cpp, function runTest(), line 397) (DLA_TEST) Error 0x00000004: (propagating from RuntimeTest.cpp, function run(), line 450) Shutdown signal received, exiting (DLA_TEST) Error 0x00000004: (propagating from main.cpp, function launchTest(), line 87) = Run CONV/CONV_D_L0_0_small_fbuf creating new runtime context... Emulator starting submitting tasks... NvDlaSubmit: Error IOCTL failed (Cannot allocate memory) (DLA_RUNTIME) Error 0x0003000f: (propagating from Runtime.cpp, function submitInternal(), line 666) (DLA_TEST) Error 0x00000004: runtime->submit() failed (in RuntimeTest.cpp, function runTest(), line 397) (DLA_TEST) Error 0x00000004: (propagating from RuntimeTest.cpp, function run(), line 450) Shutdown signal received, exiting (DLA_TEST) Error 0x00000004: (propagating from main.cpp, function launchTest(), line 87) = Run SDP/SDP_X1_L0_0_small_fbuf creating new runtime context... Emulator starting submitting tasks... NvDlaSubmit: Error IOCTL failed (Cannot allocate memory) (DLA_RUNTIME) Error 0x0003000f: (propagating from Runtime.cpp, function submitInternal(), line 666) (DLA_TEST) Error 0x00000004: runtime->submit() failed (in RuntimeTest.cpp, function runTest(), line 397) (DLA_TEST) Error 0x00000004: (propagating from RuntimeTest.cpp, function run(), line 450) Shutdown signal received, exiting (DLA_TEST) Error 0x00000004: (propagating from main.cpp, function launchTest(), line 87) = Run CDP/CDP_L0_0_small_fbuf creating new runtime context... Emulator starting submitting tasks... NvDlaSubmit: Error IOCTL failed (Cannot allocate memory) (DLA_RUNTIME) Error 0x0003000f: (propagating from Runtime.cpp, function submitInternal(), line 666) (DLA_TEST) Error 0x00000004: runtime->submit() failed (in RuntimeTest.cpp, function runTest(), line 397) (DLA_TEST) Error 0x00000004: (propagating from RuntimeTest.cpp, function run(), line 450) Shutdown signal received, exiting (DLA_TEST) Error 0x00000004: (propagating from main.cpp, function launchTest(), line 87) = Run NN/NN_L0_1_small_fbuf creating new runtime context... Emulator starting submitting tasks... NvDlaSubmit: Error IOCTL failed (Cannot allocate memory) (DLA_RUNTIME) Error 0x0003000f: (propagating from Runtime.cpp, function submitInternal(), line 666) (DLA_TEST) Error 0x00000004: runtime->submit() failed (in RuntimeTest.cpp, function runTest(), line 397) (DLA_TEST) Error 0x00000004: (propagating from RuntimeTest.cpp, function run(), line 450) Shutdown signal received, exiting (DLA_TEST) Error 0x00000004: (propagating from main.cpp, function launchTest(), line 87) root@nvdla_p5:/media/card#