nvdla / sw

NVDLA SW
Other
476 stars 190 forks source link

[U May Need It]nvdla_runtime options #9

Closed JunningWu closed 6 years ago

JunningWu commented 6 years ago

./nvdla_runtime -loadable output.protobuf

Usage: ./nvdla_runtime [-options] --loadable where options include: -h print this help message -s launch test in server mode --loadable
--image
--imgshift
--imgscale
--imgpower
--softmax

JunningWu commented 6 years ago

unhandled level 1 translation fault

1. I compiled alexnet caffe model on my virtual machine, the linux version is Ubuntu14.04. the compile process is OK. and I got the loadable file: output.protobuf, and the output dir: wisdom.dir, which contains layers/networks/tensors.

2. I want to run the nvdla_runtime with the NVDLA VP. so I copied the output file and the output dir to the VP.

Here is the Error.

/# ./nvdla_runtime --loadable output.protobuf creating new runtime context... [ 1181.040928] nvdla_runtime[1287]: unhandled level 1 translation fault (11) at 0x41ffb65a, esr 0x92000005, in libnvdla_runtime.so[ffffb54a2000+21000] [ 1181.047469] CPU: 0 PID: 1287 Comm: nvdla_runtime Not tainted 4.13.3 #1 [ 1181.054489] Hardware name: linux,dummy-virt (DT) [ 1181.055224] task: ffff80003db40e00 task.stack: ffff80003d16c000 [ 1181.055792] PC is at 0xffffb54b7d98 [ 1181.056133] LR is at 0xffffb54b3984 [ 1181.056400] pc : [<0000ffffb54b7d98>] lr : [<0000ffffb54b3984>] pstate: 80000000 [ 1181.056903] sp : 0000ffffd6afd030 [ 1181.057213] x29: 0000ffffd6afd030 x28: 0000000000000000 [ 1181.067488] x27: 0000000000000000 x26: 0000000000000000 [ 1181.067886] x25: 0000000000000000 x24: 0000000000000000 [ 1181.068077] x23: 0000000000000000 x22: 0000000000000000 [ 1181.071035] x21: 0000000039db2190 x20: 0000000000000000 [ 1181.071292] x19: 0000000041ffb65a x18: 0000000000000000 [ 1181.071481] x17: 0000ffffb54d3fb0 x16: 0000ffffb54b7d98 [ 1181.071732] x15: 0000000000000111 x14: 00000000000003f3 [ 1181.084578] x13: 0000000000000000 x12: 0000ffffb549f968 [ 1181.088542] x11: 0000000000000022 x10: 0000000000000007 [ 1181.093569] x9 : 0000000000001500 x8 : 0000000000000003 [ 1181.097488] x7 : 0000000000000001 x6 : 0000000039db2330 [ 1181.100436] x5 : 0000000000000041 x4 : 0000000000000001 [ 1181.111960] x3 : 0000ffffb54d4928 x2 : 0000ffffb54b3950 [ 1181.112223] x1 : 0000000000000006 x0 : 0000000041ffb65a Segmentation fault

Anyone may help???

  1. GDB not available on prebuilt image?

/# gdb -sh: gdb: not found

xmchen1987 commented 6 years ago

@wujunning2011 I think your input loadable file is wrong. You can try any file, and will get the same error.

I use the loadable file in the docker, but get out of bounds error. Any suggestions?

./nvdla_runtime --loadable BDMA_L0_0_fbuf creating new runtime context... libnvdla<1> failed to open dla device libnvdla<1> Out of bounds DLA instance 0 requested. (DLA_TEST) Error 0x00000004: runtime->load failed (in RuntimeTest.cpp, function loadLoadable(), line 253) (DLA_TEST) Error 0x00000004: (propagating from RuntimeTest.cpp, function run(), line 377) (DLA_TEST) Error 0x00000004: (propagating from main.cpp, function launchTest(), line 92)

jarodw0723 commented 6 years ago

@xmchen1987 Have u installed the driver (drm.ko, opendla.ko) first?

JunningWu commented 6 years ago

@xmchen1987 can you share your loadable file?

@jarodw0723 when I install the drm.ko/opendla.ko, I got level 2 translation fault. nvdla_runtime[1306]: unhandled level 2 translation fault (11) at 0x2bf2565a, esr 0x92000006

jarodw0723 commented 6 years ago

@wujunning2011 You can find the loadable file in https://github.com/nvdla/sw/tree/master/regression/flatbufs/kmd

xmchen1987 commented 6 years ago

@jarodw0723 After installing the driver, it sucesses. Thanks a lot. @wujunning2011 As @jarodw0723 mentioned, I use the prebuild file in https://github.com/nvdla/sw/tree/master/regression/flatbufs/kmd

@jarodw0723 Is the VP able to dump performance data currently? Or just for software development?

jarodw0723 commented 6 years ago

@xmchen1987 It is just for software development.

JunningWu commented 6 years ago

@jarodw0723 @xmchen1987 When I run in Docker mod, It is OK. I wonder how to get my own loadable file using nvdla_compiler, such as AlexNet.caffemodel

xmchen1987 commented 6 years ago

@jarodw0723 I see currently cmod has interface like: NV_NVDLA_cmac::NV_NVDLA_cmac( sc_module_name module_name ): NV_NVDLA_cmac_base(module_name), // Delay setup dmadelay(SC_ZERO_TIME), csbdelay(SC_ZERO_TIME), b_transportdelay(SC_ZERO_TIME)

Do you have plan to develop cmod as performance model?

geyijun commented 6 years ago

I have the same problem as wujunning2011. If the loadable file is wrong,how can i get loadable file using nvdla_compiler, such as AlexNet.caffemodel. I have read UMD code ,and known the nvdla_runtime need "flatbuf" for loadable file .But the nvdla_compiler create out file format is "protobuf". Should i make the format convert,or is there any option for nvdla_compiler to make it create "flatbuf" file ??

blueardour commented 6 years ago

+1 same request to compile and run a custom model.

Besides, @jarodw0723 Is there any schedule when the performance profiling function be ready?

xmchen1987 commented 6 years ago

@wujunning2011 @geyijun @blueardour output.protobuf is no the target loadable file. I use the default.nvdla, and succeed to run the test.

JunningWu commented 6 years ago

@xmchen1987 Thank You very much. Maybe default.nvdla is the Loadable file.

HKLee2040 commented 6 years ago

@xmchen1987 May I know more about your "default.nvdla" test? What network and what command do you use to test "default.nvdla"?

geyijun commented 6 years ago

@xmchen1987 Thanks.

blueardour commented 6 years ago

Hi, It stuck when run nvdla_runtime --loadable default.nvdla. Any clue?

................................ Welcome to Buildroot nvdla login: root Password: # mount -t 9p -o trans=virtio r /mnt # cd /mnt/ # ls CMakeCache.txt README.md install_manifest.txt CMakeFiles aarch64_toplevel libs CMakeLists.txt cmake models CPackConfig.cmake cmake_install.cmake scripts CPackSourceConfig.cmake conf src LICENSE docker tests Makefile images # cd images/ # cd linux-4.13.3/ # ls ??%@@???@8 drm.ko nvdla_runtime CONV_D_L0_0_fbuf efi-virtio.rom opendla.ko Image libnvdla_compiler.so rootfs.ext4 aarch64_nvdla.lua libnvdla_runtime.so alexnet nvdla_compiler # insmod drm.ko # insmod opendla.ko [ 35.852221] opendla: loading out-of-tree module taints kernel. [ 35.863261] reset engine done [ 35.872695] [drm] Initialized nvdla 0.0.0 20171017 for 10200000.nvdla on minor 0 # export LD_LIBRARY_PATH=$PWD # ./nvdla_runtime --loadable alexnet/default.nvdla creating new runtime context... [ 55.045834] random: crng init done ^C^C^X^C^C^C # stuck here ................................

also tried agian: ................................ Welcome to Buildroot nvdla login: root Password: # mount -t 9p -o trans=virtio r /mnt # cd /mnt/images/linux-4.13.3/ # export LD_LIBRARY_PATH=$PWD # cd alexnet/ # ./../nvdla_runtime --loadable default.nvdla creating new runtime context... [ 48.162132] random: crng init done ^C # cd .. # insmod drm.ko # insmod opendla.ko [ 70.167764] opendla: loading out-of-tree module taints kernel. [ 70.179404] reset engine done [ 70.188440] [drm] Initialized nvdla 0.0.0 20171017 for 10200000.nvdla on minor 0 # dmesg| tail [ 1.893330] VFS: Mounted root (ext4 filesystem) readonly on device 254:0. [ 1.912059] devtmpfs: mounted [ 2.054710] Freeing unused kernel memory: 1088K [ 2.232956] EXT4-fs (vda): re-mounted. Opts: data=ordered [ 3.830183] NET: Registered protocol family 10 [ 3.848857] Segment Routing with IPv6 [ 48.162132] random: crng init done [ 70.167764] opendla: loading out-of-tree module taints kernel. [ 70.179404] reset engine done [ 70.188440] [drm] Initialized nvdla 0.0.0 20171017 for 10200000.nvdla on minor 0 # cd alexnet/ # ./../nvdla_runtime --loadable default.nvdla creating new runtime context... ^C # stuck here # dmesg| tail [ 1.893330] VFS: Mounted root (ext4 filesystem) readonly on device 254:0. [ 1.912059] devtmpfs: mounted [ 2.054710] Freeing unused kernel memory: 1088K [ 2.232956] EXT4-fs (vda): re-mounted. Opts: data=ordered [ 3.830183] NET: Registered protocol family 10 [ 3.848857] Segment Routing with IPv6 [ 48.162132] random: crng init done [ 70.167764] opendla: loading out-of-tree module taints kernel. [ 70.179404] reset engine done [ 70.188440] [drm] Initialized nvdla 0.0.0 20171017 for 10200000.nvdla on minor 0 ................................

JunningWu commented 6 years ago

@blueardour I suppose it's because the AlexNet is too HUGE, which may take about 20mins to create the context, you can try LeNet first.

blueardour commented 6 years ago

@wujunning2011 Hi, thanks for your tips. May I ask whether you ever successfully run the alexnet? Based on your comment, I left the program running. After 14 hours, it seems the simulator still not finished the execution. .................................. # insmod opendla.ko [ 43.227254] opendla: loading out-of-tree module taints kernel. [ 43.239206] reset engine done [ 43.248391] [drm] Initialized nvdla 0.0.0 20171017 for 10200000.nvdla on minor 0 # export LD_LIBRARY_PATH=$PWD # ./nvdla_runtime --loadable alexnet/default.nvdla creating new runtime context... [ 72.144474] random: crng init done Unknown image type: submitting tasks... [ 7082.524154] Enter:dla_read_network_config [ 7082.528186] Exit:dla_read_network_config status=0 [ 7082.528669] Enter: dla_initiate_processors [ 7082.531573] Enter: dla_submit_operation [ 7082.532029] Prepare Convolution operation index 0 ROI 0 dep_count 1 [ 7082.532483] Enter: dla_prepare_operation [ 7082.535457] processor:Convolution group:0, rdma_group:0 available [ 7082.536056] Enter: dla_read_config [ 7082.543696] Exit: dla_read_config [ 7082.544123] Exit: dla_prepare_operation status=0 [ 7082.544593] Enter: dla_program_operation [ 7082.546769] Program Convolution operation index 0 ROI 0 Group[0] [ 7082.555487] no desc get due to index==-1 [ 7082.556460] no desc get due to index==-1 [ 7082.558436] no desc get due to index==-1 [ 7082.558787] no desc get due to index==-1 ........................ 7083.737498] Exit: dla_op_programmed [ 7083.737643] Exit: dla_program_operation status=0 [ 7083.737814] Exit: dla_submit_operation [ 7083.737961] Enter: dla_dequeue_operation [ 7083.738115] Dequeue op from CDP processor, index=18 ROI=0 [ 7083.738301] Enter: dla_submit_operation [ 7083.738456] Prepare CDP operation index 18 ROI 0 dep_count 1 [ 7083.738651] Enter: dla_prepare_operation [ 7083.738871] processor:CDP group:1, rdma_group:1 available [ 7083.739062] Enter: dla_read_config [ 7083.741600] Exit: dla_read_config [ 7083.741748] Exit: dla_prepare_operation status=0 [ 7083.741936] Enter: dla_program_operation [ 7083.742096] Program CDP operation index 18 ROI 0 Group[1] [ 7083.742494] Enter: dla_cdp_program [ 7083.742563] Enter: processor_cdp_program [ 7083.753187] Exit: processor_cdp_program [ 7083.753201] Exit: dla_cdp_program [ 7083.753356] no desc get due to index==-1 [ 7083.753615] no desc get due to index==-1 [ 7083.753760] no desc get due to index==-1 [ 7083.753910] no desc get due to index==-1 [ 7083.754058] no desc get due to index==-1 [ 7083.754210] no desc get due to index==-1 [ 7083.754362] Enter: dla_op_programmed [ 7083.754505] Exit: dla_op_programmed [ 7083.754649] Exit: dla_program_operation status=0 [ 7083.754817] Exit: dla_submit_operation [ 7083.754966] Exit: dla_dequeue_operation [ 7083.755133] Exit: dla_initiate_processors status=0 [ 7083.755376] Enter:dla_handle_events, processor:BDMA [ 7083.755620] Exit:dla_handle_events, ret:0 [ 7083.755800] Enter:dla_handle_events, processor:Convolution [ 7083.756012] Handle cdma weight done event, processor Convolution group 0 [ 7083.756260] Exit:dla_handle_events, ret:0 [ 7083.756416] Enter:dla_handle_events, processor:SDP [ 7083.756592] Exit:dla_handle_events, ret:0 [ 7083.756758] Enter:dla_handle_events, processor:PDP [ 7083.756937] Exit:dla_handle_events, ret:0 [ 7083.757092] Enter:dla_handle_events, processor:CDP [ 7083.757269] Exit:dla_handle_events, ret:0 [ 7083.757422] Enter:dla_handle_events, processor:RUBIK [ 7083.757602] Exit:dla_handle_events, ret:0

this is the last ouput after 14hours of execution.

As you mentioned a try of the lenet. The computing complexity of the Alexnet is about 1G mac according to the tool of Netscope CNN analyzer. However, most of my focused networks are bigger than Alexnet. Thus, if the simulator is so slow, it might be some kinds of unacceptable for me to run my own networks.

JunningWu commented 6 years ago

@blueardour my ALexNet's Running is not successful, which is also stucked at somewhere. According to the NVDLA VP configuration file, the system mem is 1MB, so this may have some influence on the AlexNet's Running.

with such huge NN, I suggest that you may USE Candence's Protium or Synopsys's ZEBU.

BTW, when I run the tiny Lenet, there stil has some errors, hope you can GOTO LeNet, and GIVE me some help.

blueardour commented 6 years ago

hi, @wujunning2011 sorry for late reply. After having a try of Lenet, I also failed to run it successfully.

ned-varnica commented 6 years ago

Hi, has any one found a solution to this? I am having the same issue and it seems that it is not the system virtual memory issue? Any help is appreciated. Thanks!

prasshantg commented 6 years ago

@JunningWu NVDLA VP configuration should be using 1GB system mem, which config file are you checking for it?

are you able to run LeNet?

ned-varnica commented 6 years ago

Hi, still having issues with AlexNet -- running it with 1GB system mem. I tried running utilizing the latest NVDLA updates (with this one, we can load a .jpg image format). Please see the attached log file for more info: There are some error messages that are ignored and it hangs at the last point shown in the log file:

20180205_pascalvoc_BoatRes227x227.jpg.log

Regarding LeNet: I was able to run it all the way thru without any issues (Here, the input file format used is .pgm) .

20180202_lenet_BoatRes28x28.pgm.log

prasshantg commented 6 years ago

I am able to reproduce it, created #21 for debugging AlexNet failure

JunningWu commented 6 years ago

@ned-varnica

for this line [ 9422.272985] INFO: rcu_preempt self-detected stall on CPU

You probably have a real time application that is consuming all cpu (some bad implementation) and because of its realtime scheduling priority the system doesn't have enough resources available for other tasks. I suggests that you remove realtime priority from your applications and check which one is consuming a lot of CPU and, after correcting the problem, puts it back to realtime priority

https://unix.stackexchange.com/questions/252045/rcu-preempt-detected-stalls-on-cpus-tasks-message-appears-to-continue

ned-varnica commented 6 years ago

Thanks, I ran it again (and with 2GB system memory) and that particular line is gone, but the run-time problem remains: It hangs in the same place. Please see the attached log file. 20180207_pascalvoc_BoatRes227x227.jpg.log

prasshantg commented 6 years ago

@ned-varnica I am suspecting some problem with cmod, debugging it with our HW team

JunningWu commented 6 years ago

@ned-varnica according to your logfile, L1298, Assertion Failed, maybe some engine error was happened, after have processed 24HWLs.This error is the same one as 1GB RAM case.

So, may I conclude that when we increase the RAM size from 1GB to 2GB, we can resolve the rcu_preempt error?

BTW, what does the first line "random: rcng init done" mean?

fanqifei commented 6 years ago

Hi @ned-varnica , I am trying to reproduce the issue of running AlexNet. The log shows that CACC is not in idle state as expected. Are you using cmod built from latest version of nvdla1 branch? Could you share your generated flatbuf file? Thanks.

qdchau commented 6 years ago

Hi everyone, I'm working with @ned-varnica on the same project.

@prasshantg Thanks for the update.

@JunningWu Not sure if we can draw that conclusion. We may need to repeat the run a few times with the 2GB RAM config to see if the CPU stall occurs again. "random: crng init done" is a message from the kernel random number generator driver.

@fanqifei I built the cmod from a clone of the 'nvdlav1' branch in December. The last commit I see is from 12/12/17 "new lsd design" (d9eefc7). I'm not sure what you mean by flatbuf file. I tried to attach the NVDLA loadable binary we generated from the nvdla_compiler but the file is too big even after compression.

fanqifei commented 6 years ago

@qdchau , can you send it to efan@nvidia.com? flatbuf file is the loadable file generated by nvdla_compiler.

fanqifei commented 6 years ago

@qdchau @ned-varnica I can't reproduce the issue of test hang. The test of alexnet can pass with change in cmod/include/log.h (see below. This change seems not related to the hang issue.) The version of hw nvdla1 and vp are latest version. (docker is not used) I will try to use the docker later.

+static char msg_buf[MSG_BUF_SIZE];
+
 #define cslDebugInternal(lvl, ...)          do {\
-                                                char msg_buf[MSG_BUF_SIZE]; \
                                                 int pos = snprintf(msg_buf, MSG_BUF_SIZE, "%d:", __LINE__); \
                                                 snprintf(msg_buf + pos, MSG_BUF_SIZE - pos, __VA_ARGS__); \
                                                 SC_REPORT_INFO_VERB(__FILENAME__, msg_buf, SC_DEBUG ); \
@@ -34,7 +35,6 @@
 #define cslDebug(args)                      cslDebugInternal args

 #define cslInfoInternal(...)                do {\
-                                                char msg_buf[MSG_BUF_SIZE]; \
                                                 int pos = snprintf(msg_buf, MSG_BUF_SIZE, "%d:", __LINE__); \
                                                 snprintf(msg_buf + pos, MSG_BUF_SIZE - pos, __VA_ARGS__); \
                                                 SC_REPORT_INFO_VERB(__FILENAME__, msg_buf, SC_FULL ); \
@@ -42,7 +42,6 @@
 #define cslInfo(args)                       cslInfoInternal args

 #define FAILInternal(...)                   do {\
-                                                char msg_buf[MSG_BUF_SIZE]; \
                                                 int pos = snprintf(msg_buf, MSG_BUF_SIZE, "%d:", __LINE__); \
                                                 snprintf(msg_buf + pos, MSG_BUF_SIZE - pos, __VA_ARGS__); \
                                                 SC_REPORT_INFO(__FILENAME__, msg_buf ); \
qdchau commented 6 years ago

Hi @fanqifei, for clarification are you working with prasshantg to debug or reproducing the issue independently? Thanks for sharing the source change. Do you recommend we add that code, rebuild the model, and try again? I tried to e-mail you the AlexNet loadable binary but it's too big for an e-mail attachment. We compiled it using the Caffe model and prototxt file from Model Zoo: https://github.com/BVLC/caffe/wiki/Model-Zoo#pascal-voc-2012-multilabel-classification-model

fanqifei commented 6 years ago

Hi @qdchau , I am working with Prashant. I can reproduce the issue now. Looking into it.

ned-varnica commented 6 years ago

Hi @fanqifei, thank you for looking into this issue. We look forward to your feedback. Please let us know if there is any other information you need from us at this stage.

jwise commented 6 years ago

FYI -- some of the team is out for CNY this week. I'll follow up to see who's around, but expect a little more latency on this one. Thanks!

prasshantg commented 6 years ago

We have resolved this issue, will fix push to KMD. Waiting for some verification results.

ned-varnica commented 6 years ago

Great, thanks. Much appreciated!

qdchau commented 6 years ago

Hi @prasshantg, @jwise. Would it be possible to ask for ballpark estimate of when the AlexNet fix will be available so we can update our team’s schedule?

prasshantg commented 6 years ago

@qdchau 5th Mar 2018

qdchau commented 6 years ago

Awesome. Thank you!

prasshantg commented 6 years ago

@qdchau @ned-varnica @JunningWu fix for alexnet pushed. please test it.

qdchau commented 6 years ago

Hi @prasshantg. The fix works for us. Thanks for your help!

ned-varnica commented 6 years ago

Thanks so much @prasshantg. Should we be expecting the correct output at this point? We tried this AlexNet with some images and got outputs that look like noise (negative values close to 0). On the other hand, when we run the same network on our local CPU we get very good prediction with same input images (1 out of 20 output values is a large positive number, and this matches the correct label). Do you have any recommendation how to proceed with debugging? Thanks!

JunningWu commented 6 years ago

@ned-varnica I think the rawdump file will contain 1000 predictions, like this http://ddl.escience.cn/f/Qdtr. By the way, I am using the BVLC trained model. and the input image is http://ddl.escience.cn/f/Qdts.

prasshantg commented 6 years ago

@JunningWu do you get expected results?

JunningWu commented 6 years ago

@prasshantg I am trying to figure out whether the result is indicating "CAT". The simulation process is ok, no more errors.

ned-varnica commented 6 years ago

@JunningWu In the example we are running, it has 20 outputs. The network was taken from Caffe Model Zoo http://heatmapping.org/files/bvlc_model_zoo/pascal_voc_2012_multilabel/deploy_x30.prototxt

It was trained on the following 20 categories:

  1. Aeroplane
  2. Bicycle
  3. Bird
  4. Boat
  5. Bottle
  6. Bus
  7. Car
  8. Cat
  9. Chair
  10. Cow
  11. Dining table
  12. Dog
  13. Horse
  14. Motorbike
  15. Person
  16. Potted plant
  17. Sheep
  18. Sofa
  19. Train
  20. TV monitor

In your example, looking at the rawdump file, it seems you are seeing the same issue as we do. All the entries (in your case 1000 of them, in our case 20 of them) show very small values and nothing stands out. At least, this is our experience so far.

@prasshantg Attached is one JPG image we used and the corresponding rawdump file.

boatres227x227 20180306_pascalvoc_BoatRes227x227.jpg.dimg.txt

prasshantg commented 6 years ago

This could be due to missing mean subtraction feature in compiler. Let me confirm it.

ned-varnica commented 6 years ago

Thanks @prasshantg. I agree this is the part of it, but there is probably more to it. FYI, I tried removing mean subtraction in our local simulator (just to test this hypothesis) and the result still looks OK: It can still produce outputs showing that 'Boat" is much more likely than the other 19 outputs. The confidence is worse (compared to the confidence when the appropriate means are used), but looks fine. On the other hand, the outputs we get in the file 20180306_pascalvoc_BoatRes227x227.jpg.dimg.txt (please see previous message) are not showing this behavior.