nvdla / hw

RTL, Cmodel, and testbench for NVDLA
Other
1.71k stars 565 forks source link

Can nvdla_small spec be modified with secondary SRAMIF and batch mode for NN_L0_1_small_fbuf? #265

Open nookfoo opened 5 years ago

nookfoo commented 5 years ago

Hello there,

I am running nvdla_small on ZCU102 FPGA and am curioues to find out if enabling batch mode or the secondary memory interface sramif will improve performance. I intend to try it out anyway, but wanted to ask beforehand, if this is even sensible since NN_L0_1_small_fbuf cannot be modified due to unreleased compiler.

Can NN_L0_1_small_fbuf utilize batch mode or secondary sramif enabled in nvdla_small spec?

JSnobody commented 5 years ago

@nookfoo Have you ever run nvdla_small on ZCU102 FPGA successfully?Maybe we can talk together.

ghost commented 5 years ago

@nookfoo Let me know about your findings about batch mode. From my previous observations enabling it hardly increases the resource usage in FPGA, which was quite confusing.

Enabling sramif would require BDMA which is kind of separate engine. My guess is that NN_L0_1_small_fbuf would need to physically contain instructions (operations) handling BDMA.

I don't know if sramif will work with current _nvsmall spec, but at least you can measure the actual DDR performance with Xilinx' AXI Performance Monitor IP core. Nice thing about this core is that it measures also peak wait state cycles on the AXI4 bus.

With DDR4 2400, 64-bit on the PS side it is hardly possible to use entire bandwidth from PL side (with single AXI4 bus). However you may observe quite decent wait cycles. For example FPGA running at 250 MHz, one channel write (AXI4 Traffic Generator IP Core), gave us following performance:

Alas ZU9EG does not have URAM block, which would be very handful for efficient sramif implementation.

ghost commented 5 years ago

@peterzh2018888 Have you ever heard about "One bug, one bug report?" rule? No matter if it's a bug, or problem in your setup - in last two days I've got roughly 17 notifications from you describing the same or similar problem. It starts to be indistinguishable from spamming...

Don't understand me wrong... People are reading your posts. For example I am subscribed to both sw and hw projects, and I see everything what's happening here. But this is not a big community, so it is unlikely to get response immediately (and hopefully the maintaners are busy with releasing the compiler :+1:).

You may increase your chances by describing the problem in detail, including the environment setup (compiler version, compiler flags, ...) and steps to reproduce the problem. Few people already worked on Zynq. And as I recall some already had problems with DRM and interrupts so maybe they will be kind enough to compare your workflow with theirs... By spamming them you only increase the level of their annoyance :)

shgangchen commented 5 years ago

hi @nookfoo , I received your update about the sythesis problem, however, I can't find it here. Is that not a problem anymore?

huangwei858 commented 5 years ago

@peterzh2018888 Have you ever heard about "One bug, one bug report?" rule? No matter if it's a bug, or problem in your setup - in last two days I've got roughly 17 notifications from you describing the same or similar problem. It starts to be indistinguishable from spamming...

Don't understand me wrong... People are reading your posts. For example I am subscribed to both sw and hw projects, and I see everything what's happening here. But this is not a big community, so it is unlikely to get response immediately (and hopefully the maintaners are busy with releasing the compiler +1).

You may increase your chances by describing the problem in detail, including the environment setup (compiler version, compiler flags, ...) and steps to reproduce the problem. Few people already worked on Zynq. And as I recall some already had problems with DRM and interrupts so maybe they will be kind enough to compare your workflow with theirs... By spamming them you only increase the level of their annoyance :)

Hey, I've send some questions in another issues about how to config BRAM replace RAMDP/RAMPDP, could you share your experience in logical ram wrapper instead of simulation ram