nvdla / hw

RTL, Cmodel, and testbench for NVDLA
Other
1.72k stars 567 forks source link

nv_small syntheis IOB is too large #269

Open huangwei858 opened 5 years ago

huangwei858 commented 5 years ago

I've passed nv_small syntheis with #110 suggestion. But I confused my IOB utilization is too large(>120%) for me to deploy to zcu102 board. Can anyone help me to solve this error?

cainiaowu commented 5 years ago

IOB should be 0 since the core doesnt need to talk to anyone on the board, it talks to PS.

huangwei858 commented 5 years ago

IOB should be 0 since the core doesnt need to talk to anyone on the board, it talks to PS.

Thank you for your reply. Could you share it more details info about how to set IOB to 0? IOB settings based on nv_small configuration. It's difficult for me to deep custom IOB settings.

cainiaowu commented 5 years ago

你还是先找个xilinx FPGA的教程看看吧,感觉你对很多FPGA基础概念都不熟悉,你需要的是用Block design把custom logic(这里是NVDLA)连到PS端对应AXI口上。

此外NVDLA的CSB是自定义的协议,你还要自己写个apb转csb的模块,RISC-V里有现成的可以抄。

huangwei858 commented 5 years ago

你还是先找个xilinx FPGA的教程看看吧,感觉你对很多FPGA基础概念都不熟悉,你需要的是用Block design把custom logic(这里是NVDLA)连到PS端对应AXI口上。

此外NVDLA的CSB是自定义的协议,你还要自己写个apb转csb的模块,RISC-V里有现成的可以抄。

我们已经部署nv_small到zcu102上面,但是跑yolo时候,发现64macs时性能跟nvdla-prime描述是一样的,只有7.3fps,这太落后了。当部署128macs时,zcu102资源又不够了,除了nvdla之外,您还有接触郭其他的DPU处理器吗?

cainiaowu commented 5 years ago

资源不会不够的,你用DSP了吗?ZCU102有2520个DSP单元,不要用LUT搭乘法器,效率太低。先仿真看看瓶颈在哪,NV_SMALL的默认配置不太合理,SDP只有1条PIPELINE,在浅层会很浪费MAC Utilization,多加几条有质的改善。

huangwei858 commented 5 years ago

资源不会不够的,你用DSP了吗?ZCU102有2520个DSP单元,不要用LUT搭乘法器,效率太低。先仿真看看瓶颈在哪,NV_SMALL的默认配置不太合理,SDP只有1条PIPELINE,在浅层会很浪费MAC Utilization,多加几条有质的改善。

非常高兴能收到您回复,方便加个V吗?18817334069,这是我wechat id,如果方便加的话,请您一定加我,与您共同探讨nvdla的更新优化。

wxbbuaa2011 commented 5 years ago

@huangwei858 加你好友了,我也在zcu102上实现nvdla small,希望您能指导!!