Open manox opened 3 years ago
IIRC I just followed along with the original documentation here: https://github.com/CSL-KU/firesim-nvdla#running-yolov3-on-nvdla. You will probably have to modify the FireMarshal workload files / FireSim config files a bit (FireMarshal: to match the FireMarshal version in CY, inherit from the nvdla workload nvdla-base
in this repo)(FireSim: to match the FireSim version in CY, point to a proper FireSim HW config).
Thank you @abejgonzalez, I ran into the problem, that it needs GLIBC 2.26 but there is a higher version in the generated linux. Do I need to use an older linux version now?
# cd darknet-nvdla/
# ./solo.sh
./darknet: /lib/libpthread.so.0: version `GLIBC_2.26' not found (required by ./darknet)
./darknet: /lib/libm.so.6: version `GLIBC_2.26' not found (required by ./darknet)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by ./darknet)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libodlalayer.so)
./darknet: /lib/libpthread.so.0: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libdarknet.so)
./darknet: /lib/libm.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libdarknet.so)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libdarknet.so)
./darknet: /lib/libm.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libnvdla_runtime.so)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libnvdla_runtime.so)
Frankly I don't remember since this was so long ago. I would try to update the YOLO3 workload instead of going to a lower version of Linux... Sorry I can't be of more help.
Thank you @abejgonzalez, I am grateful for any help. Managed to build the YOLO3 workload with the newer libraries. Now the execution hangs at the following point. Maybe you can give me a hint where it could fail here.
# ./solo.sh
learning_rate: Using default '0.001000'
momentum: Using default '0.900000'
decay: Using default '0.000100'
policy: Using default 'constant'
max_batches: Using default '0'
layer filters size input output
0 offset: Using default '0.000000'
shifter: Using default '0'
post_offset: Using default '0.000000'
post_scale: Using default '1.000000'
outputs 692224 num_out 5537792
1 odla tensor 0 416 x 416 x 4 -> 52 x 52 x 256
odla tensor 1 416 x 416 x 4 -> 26 x 26 x 512
odla tensor 2 416 x 416 x 4 -> 13 x 13 x 255
odla tensor 3 416 x 416 x 4 -> 13 x 13 x 256
2 input layer 1 tensor 3
make_split_layer input layer index 1 tensor 3
split tensor 3 13 x 13 x 256 -> 13 x 13 x 256
3 out layer 5 tensor 0
4 input layer 1 tensor 2
make_split_layer input layer index 1 tensor 2
split tensor 2 13 x 13 x 255 -> 13 x 13 x 255
5 post_offset: Using default '0.000000'
outputs 43095 num_out 43264
6 yolo
7 input layer 1 tensor 1
make_split_layer input layer index 1 tensor 1
split tensor 1 26 x 26 x 512 -> 26 x 26 x 512
8 odla tensor 0 26 x 26 x 512 -> 26 x 26 x 255
odla tensor 1 26 x 26 x 512 -> 26 x 26 x 128
9 input layer 8 tensor 0
make_split_layer input layer index 8 tensor 0
split tensor 0 26 x 26 x 255 -> 26 x 26 x 255
10 post_offset: Using default '0.000000'
outputs 172380 num_out 173056
11 yolo
12 input layer 8 tensor 1
make_split_layer input layer index 8 tensor 1
split tensor 1 26 x 26 x 128 -> 26 x 26 x 128
13 out layer 2 tensor 0
14 input layer 1 tensor 0
make_split_layer input layer index 1 tensor 0
split tensor 0 52 x 52 x 256 -> 52 x 52 x 256
15 odla tensor 0 52 x 52 x 256 -> 52 x 52 x 255
16 input layer 15 tensor 0
make_split_layer input layer index 15 tensor 0
split tensor 0 52 x 52 x 255 -> 52 x 52 x 255
17 post_offset: Using default '0.000000'
outputs 689520 num_out 692224
18 yolo
Loading weights from yolov3-odla.cfg...Done!
#### input image size c=4 h=416 w=416
[ 726.028421] INFO: task darknet:166 blocked for more than 120 seconds.
[ 726.028689] Tainted: G O 5.7.0-rc3-58540-g66e8cf3 #3
[ 726.028915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 726.029164] darknet D 0 166 165 0x00000000
[ 726.032899] Call Trace:
[ 726.035219] [<ffffffe005482a72>] __schedule+0x18a/0x416
[ 726.040546] [<ffffffe005482d40>] schedule+0x42/0xb2
[ 726.045392] [<ffffffe005485b6c>] schedule_timeout+0x1ba/0x24c
[ 726.051139] [<ffffffe005483f1e>] wait_for_completion+0x6e/0x140
[ 726.060449] [<ffffffdf85cc1a70>] nvdla_task_submit+0x44/0xa4 [opendla]
[ 726.066216] [<ffffffdf85cc1dfe>] nvdla_submit+0xa4/0xf8 [opendla]
[ 726.069659] [<ffffffe00512877e>] drm_ioctl_kernel+0x6e/0xaa
[ 726.075198] [<ffffffe005128a98>] drm_ioctl+0x184/0x286
[ 726.080326] [<ffffffe004f19c0a>] ksys_ioctl+0x144/0x61e
[ 726.085529] [<ffffffe004f1a0f4>] sys_ioctl+0x10/0x18
[ 726.090517] [<ffffffe004e011a4>] ret_from_syscall+0x0/0x2
Hi @manox .Have you fix this problem about GLIBC_2.26?
Hi @manox .Have you fix this problem about GLIBC_2.26?
That was quite a long time ago and I don't think I followed it up. Sorry.
Hello @manox , I have resolved this issue. I previously used the official nvdlaruntime, so displays GLIBC error, but when I use the nvdla-workload/nvdla-base/build-umd.sh script to generate my own nvdla runtime, it runs normally.
Hi @abejgonzalez, In an chipyard NVDLA integration PR conversation (https://github.com/ucb-bar/chipyard/pull/505#issuecomment-619183407) you said you have been able to run a YOLO3 workload. Is this still possible and if so, can you briefly explain how? Thank you!