Prebuilt nvdla_compiler crash

filipthoen2 commented 5 years ago

I tried to compile LeNet with the provided prebuilt compiler (https://github.com/nvdla/sw/blob/master/prebuilt/linux/nvdla_compiler):

demo@ubuntu:~/NVDLA/vp/CNN$ ./nvdla_compiler --prototxt lenet_deploy.prototxt --caffemodel lenet_iter_10000.caffemodel -o output --configtarget nv_small

and I get the following error:

(DLA) Error 0x00030003: (propagating from main.cpp, function testSetup(), line 77) (DLA) Error 0x00030003: (propagating from main.cpp, function launchTest(), line 94)

I've compiled the same network in the past with an 1+ yr old version of the same prebuilt compiler.

I tried to debug it with gdb (see below) but the executable is compiled without symbolic info.

Is this a known issue?

Will try compiling the compiler myself next,

Also what is the difference between "opendla_1.ko" and "opendla_2.ko"? Are these both for the 'nv_small' configuration or one is for 'nv_full', the other for 'nv_small'?

Filip

Type "apropos word" to search for commands related to "word"... Reading symbols from ./nvdla_compiler...done. (gdb) set args --prototxt lenet_deploy.prototxt --caffemodel lenet_iter_10000.caffemodel -o output --configtarget nv_small --profile basic (gdb) run Starting program: /home/demo/NVDLA/vp/CNN/nvdla_compiler --prototxt lenet_deploy.prototxt --caffemodel lenet_iter_10000.caffemodel -o output --configtarget nv_small --profile basic [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". (DLA) Error 0x00030003: (propagating from main.cpp, function testSetup(), line 77) (DLA) Error 0x00030003: (propagating from main.cpp, function launchTest(), line 94) [Inferior 1 (process 96174) exited with code 0377]

prasshantg commented 5 years ago

@filipthoen2 nv_small supports only int8 precision, you need to specify precision as int8 for compiler command.

you have to use opendla_2.ko which is for nv_small and nv_large. opendla_1.ko is for nv_full

filipthoen2 commented 5 years ago

I tried to use the additional '--cprecision int8' flag, but I see the same error.

I spotted that the help string output of the prebuilt compiler is different than the same in the source code (umd/apps/compiler/main.cpp), so I decided to rebuild the compiler from the sources.

After some link issues with libprotobuf.a (see separate issue I filed), the freshly compiled compiler worked fine & doesn't show the issue.

I also confirmed that the network ran fine on the QEMU vp.

So the checked in / prebuilt compiler seems out of date.

shazib-summar commented 5 years ago

the freshly compiled compiler worked fine & doesn't show the issue.

Hi, @filipthoen2 Where did you get the "freshly compiled compiler" from?

filipthoen2 commented 5 years ago

@killerzula - I pulled the compiler sources from the 'nvdla/sw' GitHub, as the nvdla_compiler is now open sourced as well.

Look under https://github.com/nvdla/sw/tree/master/umd/apps/compiler

prasshantg commented 5 years ago

@filipthoen2 glad you were able to resolve the issue. Are you using the prebuilt compiler from github/sw or vp docker container?

filipthoen2 commented 5 years ago

The prebuilt compiler I had issues with was the github/sw one. I pulled the compiler sources from github/sw & rebuild it freshly myself. That worked!

prasshantg commented 5 years ago

Thanks, will recheck and get it updated in next release.

shazib-summar commented 5 years ago

Hey, @filipthoen2. I compiled the compiler by myself as well. However, I am still facing the same errors and was hoping you could help me with it. I built the compiler using the following steps. After that I opened the directory //sw/umd/out/apps/compiler/nvdla_compiler/ and ran the following command in the terminal ./nvdla_compiler --prototxt /<path_to_prototxt_file>/deploy.prototxt --caffemodel /<path_to_caffemodel_file>/bvlc_alexnet.caffemodel --cprecision int8 --configtarget nv_small Following this I get the errors attached below I would like to emphasize that the compiler is freshly compiled by me. Also the pre-built compiler generates the same errors. You said that the compiler does not generate these errors following a fresh compilation, however, that is not the case for me. Any advices? Thanks in advance.

prasshantg commented 5 years ago

@killerzula looks like you are trying to compile AlexNet, it is not yet supported for int8 precision and hence won't work for nv_small or nv_large. Please use resnet-50

shazib-summar commented 5 years ago

Thanks for the reply @prasshantg. I have a few queries/issues.

Would you kindly provide me a link to the ResNet-50 caffemodel and prototxt file?
If I use LeNet the nvdla_compiler does not generate any errors, but it also does not generate the required output. When I run the following command ./nvdla_compiler --prototxt /[path_to_prototxt_file]/lenet.prototxt --caffemodel /[path_to_caffemodel_file]/lenet_iter_10000.caffemodel --cprecision int8 --configtarget nv_small -o ./output/ a folder by the name "wisdom.dir" is generated inside the "output" folder and a file named "output.protobuf" is generated in the pwd. However, this "wisdom.dir" folder is empty. No ".nvdla" file is generated in any case. Why is that?
What is the "loadable" file format? Is it .protobuf or .nvdla? Also what is the type of the files in the regression folder?
What are the different folders (BDMA, CDP, CONV, NN, PDP, RBK, SDP) in the regression/kmd folder for? What does each abbreviation mean?
In another thread (see below) the reader was advised to use the file CDP_L0_0_small_fbuf as a loadable for nvdla_runtime? This is not a .nvdla file. Why is it being used as a loadable file for nvdla_runtime? What are these files anyway?

Thanks a lot.

shazib-summar commented 5 years ago

Posting a follow-up to my previous comment. I download the ResNet-50 prototxt and caffemodel file from this link. I came across this link in another thread. However I am still facing problems. Following is the output of the pre-built compiler.

Command Executed ./nvdla_compiler --prototxt ./models/ResNet-50-deploy.prototxt --caffemodel ./models/ResNet-50-model.caffemodel --cprecision int8 --configtarget nv_small -o ./output/
Output Received

And for the self-compiled compiler the situation is as follows

Command Executed ./nvdla_compiler --prototxt ./models/ResNet-50-deploy.prototxt --caffemodel ./models/ResNet-50-model.caffemodel --cprecision int8 --configtarget nv_small -o ./output/
Output Received creating new wisdom context... opening wisdom context... parsing caffe network... libnvdla<3> mark prob Marking total 1 outputs initialize all tensors with const scaling factors of 127... attaching parsed network to the wisdom... compiling profile "fast-math"... config "nv_small"... libnvdla<2> Prototxt #chnls (C = 3) != Profile #chnls for input (NVDLA_IMG_A8B8G8R8: C = 4). Preferring #chnls from Profile for compiling. (DLA) Error 0x00000002: Don't support 0 (in engine-ast/ConvCoreNode.cpp, function tryToMergeWithBatchNormOp(), line 1222) (DLA) Error 0x00000002: Don't support 0 (in engine-ast/ConvCoreNode.cpp, function tryToMergeWithBatchNormOp(), line 1222) (DLA) Error 0x00000002: Don't support 0 (in engine-ast/ConvCoreNode.cpp, function tryToMergeWithBatchNormOp(), line 1222) (DLA) Error 0x00000002: Don't support 0 (in engine-ast/ConvCoreNode.cpp, function tryToMergeWithBatchNormOp(), line 1222) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) (DLA) Error 0x00000004: Truncate too high: 33 (in ./include/priv/LowPrecision.h, function scaleAndShiftFromScalarImpl2(), line 249) closing wisdom context... Of-course, the "wisdom.dir" directory is empty in both cases. Thanks

Lemiron24 commented 5 years ago

@killerzula have you solved this issue? I countered same issue as you. thanks

gitosu67 commented 5 years ago

Hi @Lemiron24 , I have not tested this out yet, but this link provides some hints for how to run resnet-50 on int8 configuration: https://github.com/nvdla/sw/blob/master/LowPrecision.md

annshen0023 commented 4 years ago

@killerzula @Lemiron24 Have you solved this issue? I countered same issue as you. Any advices? thanks

Jade-Hsu commented 4 years ago

@killerzula @Lemiron24 I have the same problem ,have you solve it?

WillPen commented 3 years ago

oh,I have the same error!!!

nvdla / sw

Prebuilt nvdla_compiler crash #156