Compiling ResNet-50 for nv_small

gitosu67 commented 4 years ago

Hi,

Has anyone been able to compile Resnet-50 for the nv_small architecture? I am getting the following errors while trying to do so:

$ ./nvdla_compiler -o resnet50 --cprecision int8 --configtarget nv_small --calibtable calibdata/resnet50.json --prototxt ResNet-50-deploy.prototxt --caffemodel ResNet-50-model.caffemodel
creating new wisdom context...
opening wisdom context...
parsing caffe network...
libnvdla<3> mark prob
Marking total 1 outputs
parsing calibration table...
attaching parsed network to the wisdom...
compiling profile "fast-math"... config "nv_small"...
libnvdla<2> Prototxt #chnls (C = 3) != Profile #chnls for input (NVDLA_IMG_A8B8G8R8: C = 4). Preferring #chnls from Profile for compiling.
(DLA) Error 0x00000002: Don't support 0 (in engine-ast/ConvCoreNode.cpp, function tryToMergeWithBatchNormOp(), line 1240)
(DLA) Error 0x00000002: Don't support 0 (in engine-ast/ConvCoreNode.cpp, function tryToMergeWithBatchNormOp(), line 1240)
(DLA) Error 0x00000002: Don't support 0 (in engine-ast/ConvCoreNode.cpp, function tryToMergeWithBatchNormOp(), line 1240)
(DLA) Error 0x00000002: Don't support 0 (in engine-ast/ConvCoreNode.cpp, function tryToMergeWithBatchNormOp(), line 1240)
closing wisdom context...

Any suggestions on how to compile correctly will be of great help. Thanks.

akioolin commented 4 years ago

@Okaymaddy the first of all, you have to check the operators support list which nvdla_compiler could do currently. the output message seems like the BatchNormOp doesn't supported by nvdla_compiler.

gitosu67 commented 4 years ago

@akioolin Is there any workaround for that? Because here: https://github.com/nvdla/sw/blob/v1.2.0-OC/CompilerFeatures.md, they have said that Resnet-50 has been verified for nv_small. And as per the documentations we have to give the --calibration for int8 precision. I am using the resnet50.json file which is already provided by them. Is there anything else I need to do to solve this problem?

akioolin commented 4 years ago

@Okaymaddy if the document is right, and command options is right, caffemodel, prototxt are all right, the final way is dive into the compiler's source code XD

the following the complain line:

PRECISION_SWITCH(modelPrec.v(), computePrec.v(), combinedWtsAndVarianceData, wtTrns.combineKernelWeightsAndScaleData, bnMode, krnlWtDims, bnDims, rawKrnlWts, rawVarData);

the macro is in this file https://github.com/nvdla/sw/blob/1ae4738f4558cd79bd872deddce5f4e618f38217/umd/core/src/compiler/include/priv/WeightTranslationUnit.h

the function in the macro is template <typename IT, typename RT> Weights combineKernelWeightsAndScaleData ( engine_ast::SDPMode sclMode, // per-channel/layer nvdla::Dims4 krnlWtDims, nvdla::Dims4 sclDims, Weights& krnlWts, Weights& sclData )

which in the same file, line 649 ~ 717. first of all, please check modelPrec and computePrec setting. maybe you have to do some trace to see the real value in the macro, and back into model prototxt to check the field in the respected setting.

Sorry for my time limitation. the demo of the project is the finale round. after the demo, maybe I can spend some time on this. Could you share me the following files: caffemodel, prototxt, calibration json file. Thank you.

BR, Akio

gitosu67 commented 4 years ago

Hi @akioolin , sorry for the delay in response. Thank you for the suggestion, I guess I will have to look more into the code. Meanwhile, I am sharing the files you asked for: https://drive.google.com/drive/folders/139rJCj9eDEwYzbaak2d8ZU_dD5P7Gk5O?usp=sharing Also, is it possible to share your email, so maybe we can talk more about this? Thanks again for looking into it.

akioolin commented 4 years ago

@Okaymaddy Thank you very much. I'll do some study after my project final demo done. Thanks a lot.

davide-giri commented 4 years ago

I have the same exact issue that @Okaymaddy has.

Is there any available set of Caffe model, prototxt and calib file for nv_small that are known to work? Right now I just want to see something other than the prebuilt flatbuffers running on nv_small.

Thanks

akioolin commented 4 years ago

@davide-giri I think the code trace of nvdla open source compiler is useful for solving this problem.

BR, Akio

davide-giri commented 4 years ago

The errors go away by using the proper --informat, which in this case should be nchw, right? However, the app that invokes NVDLA still gives errors.

First it says: Unknown image type: (DLA_TEST) Error 0x00000002: Unexpected surface format 37, defaulting to D_F16_CxHWx_x16_F. Then a bit later the kernel prints Unknown image type: (DLA_TEST) Error 0x00000002: Unexpected surface format 37, defaulting to D_F16_CxHWx_x16_F.

With the prebuilt flatbuffers there is no need to provide an input image. Is this different for the Resnet50 example?

Here is the full log of the NVDLA app:

creating new runtime context...                                                                                                                                                                                    
Emulator starting                                                                                                                                                                                                  
Unknown image type: (DLA_TEST) Error 0x00000002: Unexpected surface format 37, defaulting to D_F[  278.256800] Enter:dla_read_network_config                                                                       
[  278.272212] Exit:dla_read_network_config status=0                                                                                                                                                               
[  278.298872] Enter: dla_initiate_processors                                                                                                                                                                      
[  278.320273] Enter: dla_submit_operation                                                                                                                                                                         
[  278.342909] Prepare Convolution operation index 0 ROI 0 dep_count 1                                                                                                                                             
[  278.368672] Enter: dla_prepare_operation                                                                                                                                                                        
[  278.389215] processor:Convolution group:0, rdma_group:0 available                                                                                                                                               
[  278.415663] Enter: dla_read_config                                                                                                                                                                              
[  278.434072] Exit: dla_read_config                                                                                                                                                                               
[  278.457295] Exit: dla_prepare_operation status=0                                                                                                                                                                
[  278.479662] Enter: dla_program_operation                                                                                                                                                                        
[  278.499867] Program Convolution operation index 0 ROI 0 Group[0]                                                                                                                                                
[  278.533858] no desc get due to index==-1                                                                                                                                                                        
[  278.548329] no desc get due to index==-1                                                                                                                                                                        
[  278.572868] no desc get due to index==-1                                                                                                                                                                        
[  278.603325] no desc get due to index==-1                                                                                                                                                                        
[  278.633408] no desc get due to index==-1                                                                                                                                                                        
[  278.649256] Enter: dla_op_programmed                                                                                                                                                                            
[  278.676397] Update dependency operation index 1 ROI 0 DEP_COUNT=2                                                                                                                                               
[  278.708757] Update dependency operation index 64 ROI 0 DEP_COUNT=1                                                                                                                                              
[  278.742549] enable SDP in dla_update_dependency as depdency are resolved                                                                                                                                        
[  278.774267] Enter: dla_enable_operation                                                                                                                                                                         
[  278.795025] exit dla_enable_operation without actual enable due to processor hasn't been programmed                                                                                                             
[  278.836889] Exit: dla_enable_operation status=0                                                                                                                                                                 
[  278.858866] Exit: dla_op_programmed                                                                                                                                                                             
[  278.880616] Exit: dla_program_operation status=0                                                                                                                                                                
[  278.904885] Exit: dla_submit_operation                                                                                                                                                                          
[  278.923903] Enter: dla_dequeue_operation                                                                                                                                                                        
[  278.943670] Dequeue op from Convolution processor, index=1 ROI=0                                                                                                                                                
[  278.973000] Enter: dla_submit_operation                                                                                                                                                                         
[  278.996001] Prepare Convolution operation index 1 ROI 0 dep_count 1                                                                                                                                             
[  279.024366] Enter: dla_prepare_operation                                                                                                                                                                        
[  279.046180] processor:Convolution group:1, rdma_group:0 available                                                                                                                                               
[  279.072740] Enter: dla_read_config                                                                                                                                                                              
[  279.090230] Exit: dla_read_config
[  279.107590] Exit: dla_prepare_operation status=0                                                                                                                                                                
[  279.128709] Enter: dla_program_operation
[  279.147808] Program Convolution operation index 1 ROI 0 Group[1]
[  279.175868] no desc get due to index==-1
[  279.199091] no desc get due to index==-1
[  279.222975] no desc get due to index==-1
[  279.242816] no desc get due to index==-1
[  279.262356] no desc get due to index==-1
[  279.284996] Enter: dla_op_programmed
[  279.304876] Update dependency operation index 2 ROI 0 DEP_COUNT=2
[  279.332356] Update dependency operation index 65 ROI 0 DEP_COUNT=2
[  279.357933] Exit: dla_op_programmed
[  279.376270] Exit: dla_program_operation status=0
[  279.397339] Exit: dla_submit_operation
[  279.419507] Exit: dla_dequeue_operation
[  279.439276] Enter: dla_submit_operation
[  279.465997] Prepare SDP operation index 64 ROI 0 dep_count 0
[  279.493687] Enter: dla_prepare_operation
[  279.514690] processor:SDP group:0, rdma_group:0 available
[  279.538205] Enter: dla_read_config
[  279.557360] Exit: dla_read_config
[  279.574749] Exit: dla_prepare_operation status=0
[  279.596216] Enter: dla_program_operation
[  279.618722] Program SDP operation index 64 ROI 0 Group[0]
[  279.645469] no desc get due to index==-1
[  279.665944] no desc get due to index==-1
[  279.685662] no desc get due to index==-1
[  279.704771] no desc get due to index==-1
[  279.724304] Enter: dla_op_programmed
[  279.744427] Update dependency operation index 65 ROI 0 DEP_COUNT=1
[  279.773992] enable SDP in dla_update_dependency as depdency are resolved
[  279.807551] Enter: dla_enable_operation
[  279.833360] exit dla_enable_operation without actual enable due to processor hasn't been programmed
[  279.869287] Exit: dla_enable_operation status=0
[  279.894692] Exit: dla_op_programmed
[  279.914479] Exit: dla_program_operation status=0
[  279.937099] Enter: dla_enable_operation
[  279.955859] Enable SDP operation index 64 ROI 0
[  279.976897] Enter: dla_op_enabled
[  279.997521] Update dependency operation index 0 ROI 0 DEP_COUNT=1
[  280.025075] enable Convolution in dla_update_dependency as depdency are resolved
[  280.057539] Enter: dla_enable_operation
[  280.076230] Enable Convolution operation index 0 ROI 0
[  280.098799] Enter: dla_op_enabled
[  280.116247] Exit: dla_op_enabled
[  280.134044] Exit: dla_enable_operation status=0
[  280.155157] Exit: dla_op_enabled
[  280.173150] Exit: dla_enable_operation status=0
[  280.194170] Exit: dla_submit_operation
[  280.213405] Enter: dla_dequeue_operation
[  280.233333] Dequeue op from SDP processor, index=65 ROI=0
[  280.256847] Enter: dla_submit_operation
[  280.275575] Prepare SDP operation index 65 ROI 0 dep_count 0
[  280.299587] Enter: dla_prepare_operation                                                                                                                                                                        
[  280.318640] processor:SDP group:1, rdma_group:1 available                                                                                                                                                       
[  280.343891] Enter: dla_read_config                                                                                                                                                                              
[  280.362400] Exit: dla_read_config                                                                                                                                                                               
[  280.379313] Exit: dla_prepare_operation status=0                                                                                                                                                                
[  280.400398] Enter: dla_program_operation                                                                                                                                                                        
[  280.423786] Program SDP operation index 65 ROI 0 Group[1]                                                                                                                                                       
[  280.455804] no desc get due to index==-1                                                                                                                                                                        
[  280.476771] no desc get due to index==-1                                                                                                                                                                        
[  280.503344] no desc get due to index==-1                                                                                                                                                                        
[  280.517328] no desc get due to index==-1                                                                                                                                                                        
[  280.536304] Enter: dla_op_programmed
[  280.566218] Update dependency operation index 66 ROI 0 DEP_COUNT=2
[  280.599320] Exit: dla_op_programmed
[  280.627877] Exit: dla_program_operation status=0
[  280.654220] Enter: dla_enable_operation
[  280.675160] Enable SDP operation index 65 ROI 0
[  280.698571] Enter: dla_op_enabled
[  280.718229] Update dependency operation index 1 ROI 0 DEP_COUNT=1
[  280.747371] enable Convolution in dla_update_dependency as depdency are resolved
[  280.782901] Enter: dla_enable_operation
[  280.803441] Enable Convolution operation index 1 ROI 0
[  280.829154] Enter: dla_op_enabled
[  280.848042] Exit: dla_op_enabled
[  280.869088] Exit: dla_enable_operation status=0
[  280.896969] Exit: dla_op_enabled
[  280.915695] Exit: dla_enable_operation status=0
[  280.938595] Exit: dla_submit_operation
[  280.957181] Exit: dla_dequeue_operation
[  280.975854] Enter: dla_submit_operation
[  280.998106] Prepare PDP operation index 128 ROI 0 dep_count 1
[  281.026419] Enter: dla_prepare_operation
[  281.047491] processor:PDP group:0, rdma_group:0 available
[  281.076331] Enter: dla_read_config
[  281.095604] Exit: dla_read_config
[  281.118382] Exit: dla_prepare_operation status=0
[  281.144445] Enter: dla_program_operation
[  281.166475] Program PDP operation index 128 ROI 0 Group[0]
[  281.195903] group id 0 rdma id 0
[  281.214704] Invalid SrcInput Cude[W: 13824, H: 0, C: 24]
[  281.214941] Exit: dla_program_operation status=-3
[  281.259111] Exit: dla_submit_operation
[  281.277689] Failed to submit PDP op from index 128
[  281.302743] Exit: dla_initiate_processors status=-3
[  281.326004] Task execution failed
16_CxHWx_x16_F (in TestUtils.cpp, function Tensor2DIMG(), line 85)
submitting tasks...
NvDlaSubmit: Error IOCTL failed (No such process)
(DLA_RUNTIME) Error 0x0003000f: (propagating from Runtime.cpp, function submitInternal(), line 666)
(DLA_TEST) Error 0x00000004: runtime->submit() failed (in RuntimeTest.cpp, function runTest(), line 378)
(DLA_TEST) Error 0x00000004: (propagating from RuntimeTest.cpp, function run(), line 426)
Shutdown signal received, exiting
(DLA_TEST) Error 0x00000004: (propagating from main.cpp, function launchTest(), line 87)

akioolin commented 4 years ago

@davide-giri

According to my NVDLA Runtime integration experience, the following functions in the files should be take care:

For nvdla runtime, the input tensor's format should be align to nvdla configuration. this is the first step. the second step is the output tensor's format. the above three files show the detail of the tensor format both in input and output tensor. please take care with it.

The NVDLA runtime test application should be traced very carefully. It gives me a lot of information about how to talk to umd runtime core.

BR, Akio

gitosu67 commented 4 years ago

I believe I was getting the errors because the sample calibrator file(resnet.json) provided may be incorrect. I was able to successfully run MNIST in an FPGA for nv_small. Right now I am looking into TensoRT more as that is required to generate the calibrator files. I am providing the sample MNIST files I used if anyone wants to test: https://drive.google.com/drive/folders/1zXNyQunaURFRHuivW6SOdyLOYG0-8tQY?usp=sharing

davide-giri commented 4 years ago

Thanks @akioolin and @Okaymaddy, I'll look into this again very soon. Thanks for sharing the files.

@Okaymaddy to have the Mnist working did you modify any of the NVDLA SW source code? When you run the NVDLA runtime do you pass any input image(s)? Or do you just run the model without inputs like in the case of the prebuilt loadables?

./nvdla_runtime --loadable fast-math.nvdla

Thanks!

gitosu67 commented 4 years ago

@davide-giri I did not change any source code in NVDLA SW for Mnist and I just ran the model without any inputs, just like for the prebuilt loadable.

davide-giri commented 4 years ago

It still doesn't work out-of-the-box for me and if it does for you guys I would like to understand why. I still have the errors I had before, but here is the status:

I am working with the latest commit in the sw repository.
NVDLA only works if I revert the commit kmd: firmware: Re-design of address list usage, otherwise the execution gets stuck even for the prebuilt loadables.
To compile the models I'm using the ResNet50 and the LeNet model, prototxt and calibration that you shared here. The calibration for ResNet50 is also available in the sw repository. These only compile with no errors and warnings if I set --informat nchw or -informat ncxhwx, both cases generate an identical loadable.
Then when I run with these loadables the runtime prints an error message: Unknown image type:. Then it prints submitting tasks... and the execution on NVDLA starts.

After a good amount of computation the kmd prints

[   81.051867] Invalid SrcInput Cude[W: 13824, H: 0, C: 24]
[   81.052119] Exit: dla_program_operation status=-3
[   81.097327] Exit: dla_submit_operation
[   81.118091] Failed to submit PDP op from index 128
[   81.144232] Exit: dla_initiate_processors status=-3
[   81.169353] Task execution failed
NvDlaSubmit: Error IOCTL failed (No such process)
(DLA_RUNTIME) Error 0x0003000f: (propagating from Runtime.cpp, function submitInternal(), line 666)
(DLA_TEST) Error 0x00000004: runtime->submit() failed (in RuntimeTest.cpp, function runTest(), line 397)
(DLA_TEST) Error 0x00000004: (propagating from RuntimeTest.cpp, function run(), line 450)
Shutdown signal received, exiting
(DLA_TEST) Error 0x00000004: (propagating from main.cpp, function launchTest(), line 87)

gitosu67 commented 4 years ago

Hi @davide-giri , I can confirm that Resnet-50 does compile if I set --informat nchw. Now If I just run ./nvdla_runtime --loadable fast-math.nvdla on the FPGA, I get the following:

...
...
Work Found!
Work Done
execution time = 49312413.000000 s
Shutdown signal received, exiting
Test pass

I am not exactly sure what this Test Pass is. But, when I give an image to it which nvdla provides in regression/images, I get the following error:


root@nvdla:/media# ./nvdla_runtime --loadable fast-math.nvdla --image regression/images/digits/three.pgm 
creating new runtime context...
Emulator starting
pgm2dimg 1 28 28 1 1792 401408 401408
(DLA_TEST) Error 0x00000004: Mismatched width: 28 != 224 (in TestUtils.cpp, function createImageCopy(), line 156)
(DLA_TEST) Error 0x00000004: (propagating from RuntimeTest.cpp, function copyImageToInputTensor(), line 104)
(DLA_TEST) Error 0x00000004: (propagating from RuntimeTest.cpp, function setupInputBuffer(), line 163)
(DLA_TEST) Error 0x00000004: (propagating from RuntimeTest.cpp, function runTest(), line 391)
(DLA_TEST) Error 0x00000004: (propagating from RuntimeTest.cpp, function run(), line 450)
Shutdown signal received, exiting
(DLA_TEST) Error 0x00000004: (propagating from main.cpp, function launchTest(), line 87)

It may be because it expects an image of some specific dimension, which I am not sure about. Have you gotten any leads on the issue above?

@prasshantg any suggestions on this problem will be of great help. Thanks!

davide-giri commented 4 years ago

Resnet50 expects input images of 224x224x3, anything other than that won't work. So the mismatch error that you see seems reasonable to me. By the way in your case the kernel printouts are not showing in the terminal, but you can see them by running dmesg after running the NVDLA runtime.

Resnet50 doesn't work for me, so I'm guessing that the only possible reason is that we are working with a different software version of UMD and/or KMD. Can you tell me the exact commit of the NVDLA sw repository that you are working with? Are you compiling for ARM64?

gitosu67 commented 4 years ago

I provided a input image as given in this (https://github.com/nvdla/sw/issues/16) thread. I am getting the following results. (There is a lot of kernel logs but I'm not copying them here):

$./nvdla_runtime --loadable fast-math.nvdla --image boat.jpg --rawdump
.
.
[  625.423035] Enter:dla_handle_events, processor:RUBIK
[  625.427983] Exit:dla_handle_events, ret:0
[  625.431991] reset engine done
Work Found!
Work Done
execution time = 49486983.000000 s
Shutdown signal received, exiting
Test pass

Seems like it was able to read the input file, but I am not sure how to interpret the output.dimg file. I am attaching the file. output.dimg.txt

Also, I am working with the latest commit at the moment.

akioolin commented 4 years ago

@Okaymaddy @davide-giri for the output.dimg.txt, please check the model's output classes counts. if the output classes counts is 1000, the content of output.dimg.txt should be 1000. each number represents the respect to class's probability. you can do a softmax-like ratio calculation to show out the top-5 classes. be careful to make sure using the right label file to get the right mapping.

BR, Akio

gitosu67 commented 4 years ago

Hi @akioolin , In my comment above, the time taken is shown as execution time = 49486983.000000 s. What does this value mean? It definitely does not take 49486983.000000 seconds to run.

akioolin commented 4 years ago

@Okaymaddy please check the time measurement code is in umd or in your code. it seems micro sencond / nano second conversion problem.

BR, Akio

annshen0023 commented 4 years ago

@Okaymaddy I use the cmd： ./nvdla_compiler --profile fast-math --configtarget nv_small --cprecision int8 --prototxt hre/ResNet_50_deploy.prototxt --caffemodel hre/ResNet-50-model.caffemodel --informat nchw --calibtable hre/resnet50.json

the test result is： [ 419.460248] 246 HWLs done, totally 246 layers [ 419.464597] Enter: dla_free_op_desc op desc index 244 ROI 0 [ 419.470162] Exit: dla_free_op_desc [ 419.473555] Enter: dla_free_op_desc op desc index 245 ROI 0 [ 419.479120] Exit: dla_free_op_desc [ 419.482514] Exit:dla_op_completion processor SDP group1 status=0 [ 419.488513] Exit:dla_handle_events, ret:0 [ 419.492514] Enter:dla_handle_events, processor:PDP [ 419.497298] Exit:dla_handle_events, ret:0 [ 419.501299] Enter:dla_handle_events, processor:CDP [ 419.506074] Exit:dla_handle_events, ret:0 [ 419.510075] Enter:dla_handle_events, processor:RUBIK [ 419.515024] Exit:dla_handle_events, ret:0 [ 419.519037] reset engine done Work Found! Work Done execution time = 50257347.000000 s Shutdown signal received, exiting Test pass

but the out.dimg is one thousand " 1". And I use the latest commit umd and kmd. Could you do me a favor and tell me your compiler cmd?

gitosu67 commented 4 years ago

@annshen0023 The output might depend on the image used. I am using the same compiler cmd as I posted in my initial issue.

@akioolin @davide-giri I understand that the execution time taken is in the order of microseconds. Specifically, I believe that the actual work happens between the logs Work Found! and Work Done. If this is the case, then what is happening in the initial logs after running the command: ./nvdla_runtime --loadable fast-math.nvdla,

i.e

creating new runtime context...
Emulator starting
submitting tasks...
[   21.305748] Enter:dla_read_network_config
[   21.313765] Exit:dla_read_network_config status=0
[   21.320735] Enter: dla_initiate_processors
.
.
[  625.423035] Enter:dla_handle_events, processor:RUBIK
[  625.427983] Exit:dla_handle_events, ret:0
[  625.431991] reset engine done

I am not sure what happens in the above scene before it runs the actual inference.

gitosu67 commented 4 years ago

@akioolin @davide-giri @prasshantg It turns out that the predictions given by running Resnet-50 when supplying an image are wrong. I tested this for multiple images and none of them give correct labels. I don't know if it has anything to do with the caffemodel, prototxt and the calib file used for compiling. Has anyone been able to properly compile and run image detection using any kind of model in nv_small? If so, it will be great if you can share the required files.

ite-ch commented 4 years ago

Hi, I got the same issue when compiling resnet50 with nv_small config.

Basically, the error message is from tryToMergeWithBatchNormOp() function. In the function, they tried to switch precision of combinedWtsAndVarianceData parameter but precision of one of the function input is set UNKNOWN, and UNKNOWN is not acceptable by switch precision function, so the message pop-up. By the way, UNKNOWN is set due to a mismatch between KernelWeight precision(int8) and BatchNorm Variance precision(Float).

Even this error message happened, resnet50 nvdla loadble still can be compiled, but it stuck in nv_small vp finally. Anyone got stuck in vp when in resnet50 inference?

After tracing code, I found some issues listed as below,

nv_small and opendla-small config is used ambiguously. In the beginning, I use opendla-small as nvsmall config by cmd help's suggestion. But I met a hw cmod error when running resnet50 inference on vp. Anyway, I modified all opendla- to nv in source code, and cmod error is fixed.
Precision switch issues. I noticed that even you set --cprecision int8, some Ops, such as BiasOp, BatchNormOp, ScaleOp, switch precision to INT16 but INT8 , why? If a loadable is scaled to int16, is it still can be used on nv small hw?
How could we solve precision mismatch between Convolution weight and BatchNormOp in ResNet50?

Scriabing commented 4 years ago

@akioolin @Okaymaddy > I believe I was getting the errors because the sample calibrator file(resnet.json) provided may be incorrect. I was able to successfully run MNIST in an FPGA for nv_small. Right now I am looking into TensoRT more as that is required to generate the calibrator files. I am providing the sample MNIST files I used if an

Do u modify nv_small RTL source code from nvdla/hw/nv_small? Or use RTL of ITRI OpenDLA?

davide-giri commented 4 years ago

@annshen0023 The output might depend on the image used. I am using the same compiler cmd as I posted in my initial issue.

@akioolin @davide-giri I understand that the execution time taken is in the order of microseconds. Specifically, I believe that the actual work happens between the logs Work Found! and Work Done. If this is the case, then what is happening in the initial logs after running the command: ./nvdla_runtime --loadable fast-math.nvdla,

i.e
creating new runtime context...
Emulator starting
submitting tasks...
[   21.305748] Enter:dla_read_network_config
[   21.313765] Exit:dla_read_network_config status=0
[   21.320735] Enter: dla_initiate_processors
.
.
[  625.423035] Enter:dla_handle_events, processor:RUBIK
[  625.427983] Exit:dla_handle_events, ret:0
[  625.431991] reset engine done
I am not sure what happens in the above scene before it runs the actual inference.

It may be that the "Work Found" print gets executed by the UMD before invoking the KMD with an ioctl(). So this is just a matter of ordering of the prints between UMD and KMD, it's not a real problem. Basically "Work Found" should appear before all the prints from the KMD (i.e. the ones with the time between brackets).

The execution time measure is the execution time of runtime->submit() called in RuntimeTest.cpp. You can see the clock_gettime() functions called before and after that. Clearly in the example you are showing your execution time will include the time to do all those prints, so what I've been doing to record execution time on FPGA is to remove all the prints other than the one reporting the execution time.

I have also modified the get_elapsed_time() function to report the time in seconds:

double get_elapsed_time(struct timespec *before, struct timespec *after)
{
  double deltat_s  = (after->tv_sec - before->tv_sec);
  double deltat_ns = (after->tv_nsec - before->tv_nsec) / (double) 1000000000;
  return deltat_s + deltat_ns;
}

akioolin commented 4 years ago

@akioolin @Okaymaddy > I believe I was getting the errors because the sample calibrator file(resnet.json) provided may be incorrect. I was able to successfully run MNIST in an FPGA for nv_small. Right now I am looking into TensoRT more as that is required to generate the calibrator files. I am providing the sample MNIST files I used if an

Do u modify nv_small RTL source code from nvdla/hw/nv_small? Or use RTL of ITRI OpenDLA?

using ITRI OpenDLA

qiuweishuai commented 3 years ago

Hi, have you tested successfully of resnet-50？If so, would you please share the prototxt and caffemodel?

nagendra7890 commented 3 years ago

Hi, @akioolin @Okaymaddy @davide-giri I was trying to run Lenet with mnist but I was not able to see the execution time and test result.I was confused about where I did a mistake my kmd was fine. even I changed the timing on RuntimeTest.CPP but still facing the same problem. can you guys help me out with this issue?it would be a great help

Thank You.

KMD and runtime

davide-giri commented 3 years ago

Hi, does the program terminate? The execution time should be printed at the very end of the execution, which is not shown in your screenshot.

nagendra7890 commented 3 years ago

Hi, does the program terminate? The execution time should be printed at the very end of the execution, which is not shown in your screenshot.

Hi sir @davide-giri There was no printing result it got stuck there and I waited for some time for the result but no update.

Please see the full log.

akioolin commented 3 years ago

HI, @nagendra7890 :

For execution time. the following could be a reference.

In VP Mode, for lenet, it took about 1~2 hours to execute on core-i5 for resent, it could took about one day or half day.

In FPGA with NVDLA small 64 enabled, for lenet, it took about 200ms. for resnet, it took about 500ms.

If the runtime finish task, it will print out some message like task complete. and you could check the content of output.dimg. to see the result.

BR, Akio

nagendra7890 commented 3 years ago

HI, @akioolin I am using FPGA (ZCU102) . the program was running for 30 minutes or more but there was no update.do I need to do any modifications before compilation?

akioolin commented 3 years ago

Hi, @nagendra7890 :

please check the interrupt of nvdla issued. from the output message, the model seems lenet.

if the nvdla_runtime doesn't exit, and to shell prompt. the interrupt should be checked.

BR, akio

nagendra7890 commented 3 years ago

Hi @akioolin The interrupt is loaded. please see the log

akioolin commented 3 years ago

Hi, @nagendra7890 :

please do some check after launch nvdla_runtime. If the nvdla ip work correct, the received interrupt which sent from nvdla to cpu will increase.

BR, Akio

nvdla / sw

Compiling ResNet-50 for nv_small #177