nvdla / hw

RTL, Cmodel, and testbench for NVDLA
Other
1.72k stars 567 forks source link

Advice one implementing NVDLA on FPGA #196

Open nookfoo opened 6 years ago

nookfoo commented 6 years ago

Hi, my goal is to eventually run a configurable nv_small on Zynq Ultrascale+ while using an nvdc compiled caffe model for inference. I am still a novice in the field of FPGAs, NNs, etc. and have some trouble comprehending the basic implementation flow of NVDLA. I have read through documentation and a few related discussions (#110, nvdla/sw#70) and would like to roughly reiterate what I have gathered so far.

General steps for implementation of NVDLA on Zynq Ultrascale+:

  1. Generate RTL model of nv_small using hw manual

    • try different configurations of nv_small
  2. Synthesis and Implementation of RTL model for Zynq Ultrascale+ (using Vivado in this case)

  3. Setup nvdla_runtime (umd, kmd) on Linux distro running on Zynq Ultrascale+ SoC

    • refer to this comment
    • run sanity checks etc. (test applications)
    • use prebuilts/linux provided by nvdla/sw
    • use provided loadables or generate self-configured NN with nvdla_compiler

I'm aware this is a very rough estimate on what needs to be done and there will a multitude of sub-steps and issues that will need to be resolved, but I would greatly appreciate feedback on if this is the right approach. Thanks!

ghost commented 6 years ago

Yeah, these additional minor steps are hard to define but almost inevitable, but in general the plan sounds good to me. Except for:

generate self-configured NN with nvdla_compiler

This will be eventually possible if nvdla_compiler gets support for configurations other than nv_full (a.k.a. nvdlav1). Unfortunately the official roadmap is out-dated and nobody knows when it may happen.

xiaoguoer commented 5 years ago

@nookfoo For your second step, could you please share how do you do with "Add DW02_tree DW_lsb DW_minmax files" and "global define" in the sub-steps you refered??

Thanks in advance for any reply I might have.

nookfoo commented 5 years ago

Hi, "Add DW02_tree DW_lsb DW_minmax files" was not necessary for me since that is used for DesignWare products and simulation I think. I used Vivado for my implementation.

I am myself not sure yet where to add global defines. For now I have them added under Project Settings -> Verilog Options in Vivado and also in NV_HVACC_NVDLA_tick_defines.h.

xiaoguoer commented 5 years ago

@nookfoo thank you ~ the answer from mmaciag for this question. hope helpful for you.

shgangchen commented 5 years ago

you can include the path of *defines.h in the "simulation setting" in vivado. And if you want to make some macro globle, you just right click the file and "set as globle include". For other macros you want to define, such as `FPGA, you can make a define.h file, include it and set to as globle.

shgangchen commented 5 years ago

So up to now, there is no specific directions for which directive macros should be defined if you implement the master branch nv_small into FPGA. For those who has done the simulation and synthesis, they can only check if their settings are right through runing test. Am I right? So, sharing the setting of the successful project and your configurations are valued much for others.

ghost commented 5 years ago

@peterzh2018888 You need to make a custom IP core wrapper around the NVDLA. Only then you can use it as a component in your block design. The signals quite easily map to AXI4 and APB. Have you seen this thread? https://github.com/nvdla/hw/issues/110.

honorpeter commented 5 years ago

110. Not specific enough to operate and implement. Can you be practical? It's not simple at all.

honorpeter commented 5 years ago

@peterzh2018888 You need to make a custom IP core wrapper around the NVDLA. Only then you can use it as a component in your block design. The signals quite easily map to AXI4 and APB. Have you seen this thread? #110. Can you be practical? It's not simple at all.#110. Not specific enough to operate and implement.

embedeepLHY commented 5 years ago

If you just looking for a TPU design with FPGA, maybe the "FREE-TPU" (https://github.com/embedeep/Free-TPU) is suit for you