ucb-bar / nvdla-workload

Base NVDLA Workload for FireMarshal
6 stars 4 forks source link

YOLO3 workload #5

Open manox opened 3 years ago

manox commented 3 years ago

Hi @abejgonzalez, In an chipyard NVDLA integration PR conversation (https://github.com/ucb-bar/chipyard/pull/505#issuecomment-619183407) you said you have been able to run a YOLO3 workload. Is this still possible and if so, can you briefly explain how? Thank you!

abejgonzalez commented 3 years ago

IIRC I just followed along with the original documentation here: https://github.com/CSL-KU/firesim-nvdla#running-yolov3-on-nvdla. You will probably have to modify the FireMarshal workload files / FireSim config files a bit (FireMarshal: to match the FireMarshal version in CY, inherit from the nvdla workload nvdla-base in this repo)(FireSim: to match the FireSim version in CY, point to a proper FireSim HW config).

manox commented 3 years ago

Thank you @abejgonzalez, I ran into the problem, that it needs GLIBC 2.26 but there is a higher version in the generated linux. Do I need to use an older linux version now?

# cd darknet-nvdla/
# ./solo.sh 
./darknet: /lib/libpthread.so.0: version `GLIBC_2.26' not found (required by ./darknet)
./darknet: /lib/libm.so.6: version `GLIBC_2.26' not found (required by ./darknet)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by ./darknet)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libodlalayer.so)
./darknet: /lib/libpthread.so.0: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libdarknet.so)
./darknet: /lib/libm.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libdarknet.so)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libdarknet.so)
./darknet: /lib/libm.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libnvdla_runtime.so)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libnvdla_runtime.so)
abejgonzalez commented 3 years ago

Frankly I don't remember since this was so long ago. I would try to update the YOLO3 workload instead of going to a lower version of Linux... Sorry I can't be of more help.

manox commented 3 years ago

Thank you @abejgonzalez, I am grateful for any help. Managed to build the YOLO3 workload with the newer libraries. Now the execution hangs at the following point. Maybe you can give me a hint where it could fail here.

# ./solo.sh 
learning_rate: Using default '0.001000'
momentum: Using default '0.900000'
decay: Using default '0.000100'
policy: Using default 'constant'
max_batches: Using default '0'
layer     filters    size              input                output
    0 offset: Using default '0.000000'
shifter: Using default '0'
post_offset: Using default '0.000000'
post_scale: Using default '1.000000'
outputs 692224 num_out 5537792
    1 odla          tensor 0  416 x 416 x   4   ->    52 x  52 x 256
odla          tensor 1  416 x 416 x   4   ->    26 x  26 x 512
odla          tensor 2  416 x 416 x   4   ->    13 x  13 x 255
odla          tensor 3  416 x 416 x   4   ->    13 x  13 x 256
    2 input layer 1 tensor 3
make_split_layer input layer index 1 tensor 3
split          tensor 3   13 x  13 x 256   ->    13 x  13 x 256
    3 out layer 5 tensor 0
    4 input layer 1 tensor 2
make_split_layer input layer index 1 tensor 2
split          tensor 2   13 x  13 x 255   ->    13 x  13 x 255
    5 post_offset: Using default '0.000000'
outputs 43095 num_out 43264
    6 yolo
    7 input layer 1 tensor 1
make_split_layer input layer index 1 tensor 1
split          tensor 1   26 x  26 x 512   ->    26 x  26 x 512
    8 odla          tensor 0   26 x  26 x 512   ->    26 x  26 x 255
odla          tensor 1   26 x  26 x 512   ->    26 x  26 x 128
    9 input layer 8 tensor 0
make_split_layer input layer index 8 tensor 0
split          tensor 0   26 x  26 x 255   ->    26 x  26 x 255
   10 post_offset: Using default '0.000000'
outputs 172380 num_out 173056
   11 yolo
   12 input layer 8 tensor 1
make_split_layer input layer index 8 tensor 1
split          tensor 1   26 x  26 x 128   ->    26 x  26 x 128
   13 out layer 2 tensor 0
   14 input layer 1 tensor 0
make_split_layer input layer index 1 tensor 0
split          tensor 0   52 x  52 x 256   ->    52 x  52 x 256
   15 odla          tensor 0   52 x  52 x 256   ->    52 x  52 x 255
   16 input layer 15 tensor 0
make_split_layer input layer index 15 tensor 0
split          tensor 0   52 x  52 x 255   ->    52 x  52 x 255
   17 post_offset: Using default '0.000000'
outputs 689520 num_out 692224
   18 yolo
Loading weights from yolov3-odla.cfg...Done!
#### input image size c=4 h=416 w=416
[  726.028421] INFO: task darknet:166 blocked for more than 120 seconds.
[  726.028689]       Tainted: G           O      5.7.0-rc3-58540-g66e8cf3 #3
[  726.028915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  726.029164] darknet         D    0   166    165 0x00000000
[  726.032899] Call Trace:
[  726.035219] [<ffffffe005482a72>] __schedule+0x18a/0x416
[  726.040546] [<ffffffe005482d40>] schedule+0x42/0xb2
[  726.045392] [<ffffffe005485b6c>] schedule_timeout+0x1ba/0x24c
[  726.051139] [<ffffffe005483f1e>] wait_for_completion+0x6e/0x140
[  726.060449] [<ffffffdf85cc1a70>] nvdla_task_submit+0x44/0xa4 [opendla]
[  726.066216] [<ffffffdf85cc1dfe>] nvdla_submit+0xa4/0xf8 [opendla]
[  726.069659] [<ffffffe00512877e>] drm_ioctl_kernel+0x6e/0xaa
[  726.075198] [<ffffffe005128a98>] drm_ioctl+0x184/0x286
[  726.080326] [<ffffffe004f19c0a>] ksys_ioctl+0x144/0x61e
[  726.085529] [<ffffffe004f1a0f4>] sys_ioctl+0x10/0x18
[  726.090517] [<ffffffe004e011a4>] ret_from_syscall+0x0/0x2
Yuxin-Yu commented 1 year ago

Hi @manox .Have you fix this problem about GLIBC_2.26?

manox commented 1 year ago

Hi @manox .Have you fix this problem about GLIBC_2.26?

That was quite a long time ago and I don't think I followed it up. Sorry.

Yuxin-Yu commented 1 year ago

Hello @manox , I have resolved this issue. I previously used the official nvdlaruntime, so displays GLIBC error, but when I use the nvdla-workload/nvdla-base/build-umd.sh script to generate my own nvdla runtime, it runs normally.