icubecorp / nvdla_compiler

21 stars 13 forks source link

CSC assertion when executing flatbut generated by caffe2fb #1

Open ghost opened 5 years ago

ghost commented 5 years ago

Ok... so I've just reached first issue :)

I would start from some questions:

  1. Is caffe2fb output compatible with current UMD _nvdlaruntime from nvdla/sw repository?
  2. Is it compatible with current nvdla/hw master, with default _nvlarge specification file?
  3. What command line is expected to run flatbuffer file with _nvdlaruntime?
  4. Is lenet.param your custom format? Is lenet.bin as it is created by official Caffe (binaryproto) or is it also some custom format?

I ran the virtual platform with compiled CMOD as nv_large, but unfortunatelly it asserted on some CSC operation.

# ./nvdla_runtime --loadable ../compiler/umd/out/runtime/caffe2fb/flatbuffer 
creating new runtime context...
Emulator starting
Unknown image type: submitting tasks...
[ 1781.801010] Enter:dla_read_network_config
[ 1781.801618] Exit:dla_read_network_config status=0
[ 1781.801854] Enter: dla_initiate_processors
[ 1781.803280] Enter: dla_submit_operation
[ 1781.803462] Prepare Convolution operation index 0 ROI 0 dep_count 1
[ 1781.803664] Enter: dla_prepare_operation
[ 1781.804154] processor:Convolution group:0, rdma_group:0 available
[ 1781.804379] Enter: dla_read_config
[ 1781.804644] Exit: dla_read_config
[ 1781.804862] Exit: dla_prepare_operation status=0
[ 1781.805046] Enter: dla_program_operation
[ 1781.805189] Program Convolution operation index 0 ROI 0 Group[0]
[ 1781.816574] no desc get due to index==-1
[ 1781.816817] no desc get due to index==-1
[ 1781.816939] no desc get due to index==-1
[ 1781.817054] no desc get due to index==-1
[ 1781.817180] no desc get due to index==-1
[ 1781.817309] Enter: dla_op_programmed
[ 1781.817571] Update dependency operation index 3 ROI 0 DEP_COUNT=3
[ 1781.817747] Update dependency operation index 1 ROI 0 DEP_COUNT=1
[ 1781.817922] enable SDP in dla_update_dependency as depdency are resolved
[ 1781.819494] Enter: dla_enable_operation
[ 1781.819665] exit dla_enable_operation without actual enable due to processor hasn't been programmed
[ 1781.819890] Exit: dla_enable_operation status=0
[ 1781.820061] Exit: dla_op_programmed
[ 1781.820182] Exit: dla_program_operation status=0
[ 1781.820437] Exit: dla_submit_operation
[ 1781.820636] Enter: dla_dequeue_operation
[ 1781.820791] Dequeue op from Convolution processor, index=3 ROI=0
[ 1781.820994] Enter: dla_submit_operation
[ 1781.821115] Prepare Convolution operation index 3 ROI 0 dep_count 2
[ 1781.821328] Enter: dla_prepare_operation
[ 1781.821567] processor:Convolution group:1, rdma_group:0 available
[ 1781.821740] Enter: dla_read_config
[ 1781.823867] Exit: dla_read_config
[ 1781.824080] Exit: dla_prepare_operation status=0
[ 1781.824255] Enter: dla_program_operation
[ 1781.824381] Program Convolution operation index 3 ROI 0 Group[1]
[ 1781.833034] no desc get due to index==-1
[ 1781.834579] no desc get due to index==-1
[ 1781.834733] no desc get due to index==-1
[ 1781.834851] no desc get due to index==-1
[ 1781.835010] no desc get due to index==-1
[ 1781.835125] Enter: dla_op_programmed
[ 1781.835238] Update dependency operation index 6 ROI 0 DEP_COUNT=3
[ 1781.835395] Update dependency operation index 4 ROI 0 DEP_COUNT=2
[ 1781.835547] Exit: dla_op_programmed
[ 1781.835653] Exit: dla_program_operation status=0
[ 1781.835776] Exit: dla_submit_operation
[ 1781.836003] Exit: dla_dequeue_operation
[ 1781.836242] Enter: dla_submit_operation
[ 1781.836375] Prepare SDP operation index 1 ROI 0 dep_count 0
[ 1781.836522] Enter: dla_prepare_operation
[ 1781.836921] processor:SDP group:0, rdma_group:0 available
[ 1781.837074] Enter: dla_read_config
[ 1781.837298] Exit: dla_read_config
[ 1781.837458] Exit: dla_prepare_operation status=0
[ 1781.837612] Enter: dla_program_operation
[ 1781.837735] Program SDP operation index 1 ROI 0 Group[0]
[ 1781.845567] no desc get due to index==-1
[ 1781.845817] no desc get due to index==-1
[ 1781.846950] no desc get due to index==-1
[ 1781.847109] no desc get due to index==-1
[ 1781.847270] Enter: dla_op_programmed
[ 1781.847402] Update dependency operation index 4 ROI 0 DEP_COUNT=1
[ 1781.847573] enable SDP in dla_update_dependency as depdency are resolved
[ 1781.847812] Enter: dla_enable_operation
[ 1781.848010] exit dla_enable_operation without actual enable due to processor hasn't been programmed
[ 1781.848305] Exit: dla_enable_operation status=0
[ 1781.848452] Exit: dla_op_programmed
[ 1781.848570] Exit: dla_program_operation status=0
[ 1781.848773] Enter: dla_enable_operation
[ 1781.848942] Enable SDP operation index 1 ROI 0
[ 1781.849546] Enter: dla_op_enabled
[ 1781.849760] Update dependency operation index 0 ROI 0 DEP_COUNT=1
[ 1781.849918] enable Convolution in dla_update_dependency as depdency are resolved
[ 1781.852002] Enter: dla_enable_operation
[ 1781.852245] Enable Convolution operation index 0 ROI 0
[ 1781.854281] Enter: dla_op_enabled
[ 1781.855335] Exit: dla_op_enabled
[ 1781.855567] Exit: dla_enable_operation status=0
[ 1781.855724] Exit: dla_op_enabled
[ 1781.855861] Exit: dla_enable_operation status=0
[ 1781.856087] Exit: dla_submit_operation
[ 1781.856236] Enter: dla_dequeue_operation
[ 1781.856440] Dequeue op from SDP processor, index=4 ROI=0
[ 1781.856622] Enter: dla_submit_operation
[ 1781.856748] Prepare SDP operation index 4 ROI 0 dep_count 0
[ 1781.856935] Enter: dla_prepare_operation

Fatal: NV_NVDLA_csc.cpp: 659:NV_NVDLA_csc::SendDataToMacSequencerDirectConvCommon, invalid configuration csc_entries_, actual value is 0x6, it shall be 0x1B.
In file: ../cmod/csc/NV_NVDLA_csc.cpp:659
In process: nvdla.nvdla_core.nvdla_core.csc.DataLoadSequenceThread @ 3411100 ms
Aborted
user@virtmach:~/project/nvdla/vp$ 

I think I have already seen that csc_entries_ in completely different place. It happened when I was running trace_tests. In some of them I have modified the CBUF banks allocation. Here is example of file so you can see what I am talking about. I changed the D_BANK_0 registers. This triggered the CMOD assertion too. Then I realized the weight/feature data could not fit selected banks with 'my' allocation.

wyxsky commented 5 years ago

@mmaciag As far as i can see, the lenet.param lenet.bin is NCNN format, FYI: https://github.com/Tencent/ncnn

shgoupf commented 5 years ago

I have the same CSC error when running on VP.

Previously I thought it might be the problem of KMD atom_size = 32, but then I realized the CSC assertion is still there with the fix to KMD (so I deleted my previous comment).

The generated flatbuffer is hang on my FPGA due to no response from hardware.

So seems like this compiler still needs some work to clean pipe.

icubecorp commented 5 years ago

@mmaciag sorry for the delay reply. see comment inline:

  1. Is caffe2fb output compatible with current UMD _nvdlaruntime from nvdla/sw repository? yes, we just use some header files in this repo.
  2. Is it compatible with current nvdla/hw master, with default _nvlarge specification file? we just run on VP which configure for full not small, and use the default kmd.ko drm.ko and umd.ko you can download the those default modules from https://github.com/nvdla/sw/tree/master/prebuilt/linux. we haven't yet run on our FPGA, because now our FPGA just support small mode.
  3. What command line is expected to run flatbuffer file with _nvdlaruntime? ./nvdla_runtime --loadable flatbuffer --image your_image_file --rawdump then you will find the result in out file.
  4. Is lenet.param your custom format? Is lenet.bin as it is created by official Caffe (binaryproto) or is it also some custom format? those two files is generated by tencent ncnn from official caffe, the following link is the tencent ncnn https://github.com/Tencent/ncnn. we just use the ncnn code to parse the official caffe.
ghost commented 5 years ago

@icubecorp Thank you for response