nvdla / hw

RTL, Cmodel, and testbench for NVDLA
Other
1.71k stars 565 forks source link

do we need to load all feature data to CBUF before starting convolution ? #249

Closed zhouweiscut closed 5 years ago

zhouweiscut commented 5 years ago

hi , i try to run yolov3 conv5 layer with INT16 data type on NVDLA hw RTL simulation, the size of input and output shown as below: weight: 3 x 3 x128 / stride 2, feature data: 208 x 208 x 64 -> output feature data: 104 x 104 x 128 obviously, feature data is too large to be loaded to CBUF (512KB). but if we configure the register CDMA.D_BANK = 0x0000e, CDMA will transfer part of feature data from SRAM to CBUF until CBUF is full, however, CSC wouldn't start to work because _sliceavl will not bigger than feature data height(208), so _dat_cbufready is always 0 shown in below code. so it seems that we need split feature data to smaller parts (such as 208 x 12 x 64) until it can be loaded to CBUF. and then calculate many times (208/12) to work out the whole feature map. am i right?

cbuf
SCLUO commented 5 years ago

Yes, you are right. DLA needs to split large input. Comparatively, DLA can load large weights part by part automatically.

zhouweiscut commented 5 years ago

@SCLUO thank you very much for your confirmation. i have another question, do we need split large input manually? it seems that RUBIK block has Split and Merge function, shall we use RUBIK to split feature data ?

anakin1028 commented 5 years ago

@zhenpengzuo have you solved this problem? http://nvdla.org/hw/v1/ias/programming_guide.html#convolution-pipeline-programming The above link shows S/W has to split the H dimension to H/W. From the H/W units document, it seems the RUBIK split in Channel dimension instead of the H dimension? http://nvdla.org/hw/v1/ias/unit_description.html#split-and-merge

zhouweiscut commented 5 years ago

ok, thanks.