nvdla / sw

NVDLA SW
Other
489 stars 193 forks source link

Prebuilt nvdla_compiler crash #156

Closed filipthoen2 closed 5 years ago

filipthoen2 commented 5 years ago

I tried to compile LeNet with the provided prebuilt compiler (https://github.com/nvdla/sw/blob/master/prebuilt/linux/nvdla_compiler):

demo@ubuntu:~/NVDLA/vp/CNN$ ./nvdla_compiler --prototxt lenet_deploy.prototxt --caffemodel lenet_iter_10000.caffemodel -o output --configtarget nv_small

and I get the following error:

(DLA) Error 0x00030003: (propagating from main.cpp, function testSetup(), line 77) (DLA) Error 0x00030003: (propagating from main.cpp, function launchTest(), line 94)

I've compiled the same network in the past with an 1+ yr old version of the same prebuilt compiler.

I tried to debug it with gdb (see below) but the executable is compiled without symbolic info.

Is this a known issue?

Will try compiling the compiler myself next,

Also what is the difference between "opendla_1.ko" and "opendla_2.ko"? Are these both for the 'nv_small' configuration or one is for 'nv_full', the other for 'nv_small'?

Filip

Type "apropos word" to search for commands related to "word"... Reading symbols from ./nvdla_compiler...done. (gdb) set args --prototxt lenet_deploy.prototxt --caffemodel lenet_iter_10000.caffemodel -o output --configtarget nv_small --profile basic (gdb) run Starting program: /home/demo/NVDLA/vp/CNN/nvdla_compiler --prototxt lenet_deploy.prototxt --caffemodel lenet_iter_10000.caffemodel -o output --configtarget nv_small --profile basic [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". (DLA) Error 0x00030003: (propagating from main.cpp, function testSetup(), line 77) (DLA) Error 0x00030003: (propagating from main.cpp, function launchTest(), line 94) [Inferior 1 (process 96174) exited with code 0377]

prasshantg commented 5 years ago

@filipthoen2 nv_small supports only int8 precision, you need to specify precision as int8 for compiler command.

you have to use opendla_2.ko which is for nv_small and nv_large. opendla_1.ko is for nv_full

filipthoen2 commented 5 years ago

I tried to use the additional '--cprecision int8' flag, but I see the same error.

I spotted that the help string output of the prebuilt compiler is different than the same in the source code (umd/apps/compiler/main.cpp), so I decided to rebuild the compiler from the sources.

After some link issues with libprotobuf.a (see separate issue I filed), the freshly compiled compiler worked fine & doesn't show the issue.

I also confirmed that the network ran fine on the QEMU vp.

So the checked in / prebuilt compiler seems out of date.

shazib-summar commented 5 years ago

the freshly compiled compiler worked fine & doesn't show the issue.

Hi, @filipthoen2 Where did you get the "freshly compiled compiler" from?

filipthoen2 commented 5 years ago

@killerzula - I pulled the compiler sources from the 'nvdla/sw' GitHub, as the nvdla_compiler is now open sourced as well.

Look under https://github.com/nvdla/sw/tree/master/umd/apps/compiler

prasshantg commented 5 years ago

@filipthoen2 glad you were able to resolve the issue. Are you using the prebuilt compiler from github/sw or vp docker container?

filipthoen2 commented 5 years ago

The prebuilt compiler I had issues with was the github/sw one. I pulled the compiler sources from github/sw & rebuild it freshly myself. That worked!

prasshantg commented 5 years ago

Thanks, will recheck and get it updated in next release.

shazib-summar commented 5 years ago

Hey, @filipthoen2. I compiled the compiler by myself as well. However, I am still facing the same errors and was hoping you could help me with it. I built the compiler using the following steps. image After that I opened the directory //sw/umd/out/apps/compiler/nvdla_compiler/ and ran the following command in the terminal ./nvdla_compiler --prototxt /<path_to_prototxt_file>/deploy.prototxt --caffemodel /<path_to_caffemodel_file>/bvlc_alexnet.caffemodel --cprecision int8 --configtarget nv_small Following this I get the errors attached below image I would like to emphasize that the compiler is freshly compiled by me. Also the pre-built compiler generates the same errors. You said that the compiler does not generate these errors following a fresh compilation, however, that is not the case for me. Any advices? Thanks in advance.

prasshantg commented 5 years ago

@killerzula looks like you are trying to compile AlexNet, it is not yet supported for int8 precision and hence won't work for nv_small or nv_large. Please use resnet-50

shazib-summar commented 5 years ago

Thanks for the reply @prasshantg. I have a few queries/issues.

image

Thanks a lot.

shazib-summar commented 5 years ago

Posting a follow-up to my previous comment. I download the ResNet-50 prototxt and caffemodel file from this link. I came across this link in another thread. However I am still facing problems. Following is the output of the pre-built compiler.

And for the self-compiled compiler the situation is as follows

Lemiron24 commented 5 years ago

@killerzula have you solved this issue? I countered same issue as you. thanks

gitosu67 commented 5 years ago

Hi @Lemiron24 , I have not tested this out yet, but this link provides some hints for how to run resnet-50 on int8 configuration: https://github.com/nvdla/sw/blob/master/LowPrecision.md

annshen0023 commented 4 years ago

@killerzula @Lemiron24 Have you solved this issue? I countered same issue as you. Any advices? thanks

Jade-Hsu commented 4 years ago

@killerzula @Lemiron24 I have the same problem ,have you solve it?

WillPen commented 3 years ago

oh,I have the same error!!!