pjreddie / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
25.8k stars 21.33k forks source link

Prediction time difference between demo and test flags #2330

Open canerkaraguler opened 3 years ago

canerkaraguler commented 3 years ago

I trained a custom yoloV4-tiny model over darknet repo. The trained model is working well but I realized the prediction time difference between the ./darknet detector test and ./darknet detector demo . With test flag, the prediction (network_predict(net, X);) takes nearly 200-250 milliseconds per frame but with demo flag the prediction takes nearly 3-4 milliseconds per frame. I inspected the detector.c and demo.c files and I can not find a a reason for this prediction time difference. The only difference that I find is the way of obtaining the image. What can be the reason of this prediction time difference ?

Note that for both flags GPU is active.

Edit : I inspected the forward_network_gpu function in network_kernels.cu file and obtained the benchmark results like below:

./darknet detector demo :

Sorted by time (forward):
0 - fw-sort-layer 28 - type: 0 - avg_time 0.508256 ms 
1 - fw-sort-layer 2 - type: 0 - avg_time 0.506348 ms 
2 - fw-sort-layer 26 - type: 0 - avg_time 0.459425 ms 
3 - fw-sort-layer 10 - type: 0 - avg_time 0.451413 ms 
4 - fw-sort-layer 35 - type: 0 - avg_time 0.417269 ms 
5 - fw-sort-layer 18 - type: 0 - avg_time 0.344194 ms 
6 - fw-sort-layer 0 - type: 0 - avg_time 0.315203 ms 
7 - fw-sort-layer 4 - type: 0 - avg_time 0.296733 ms 
8 - fw-sort-layer 11 - type: 9 - avg_time 0.248557 ms 
9 - fw-sort-layer 1 - type: 0 - avg_time 0.246264 ms 
10 - fw-sort-layer 5 - type: 0 - avg_time 0.236804 ms 
11 - fw-sort-layer 21 - type: 0 - avg_time 0.231488 ms 
12 - fw-sort-layer 12 - type: 0 - avg_time 0.218595 ms 
13 - fw-sort-layer 20 - type: 0 - avg_time 0.204550 ms 
14 - fw-sort-layer 13 - type: 0 - avg_time 0.161447 ms 
15 - fw-sort-layer 27 - type: 0 - avg_time 0.140323 ms 
16 - fw-sort-layer 29 - type: 0 - avg_time 0.123811 ms 
17 - fw-sort-layer 15 - type: 0 - avg_time 0.114963 ms 
18 - fw-sort-layer 7 - type: 0 - avg_time 0.107157 ms 
19 - fw-sort-layer 23 - type: 0 - avg_time 0.102330 ms 
20 - fw-sort-layer 19 - type: 9 - avg_time 0.087676 ms 
21 - fw-sort-layer 32 - type: 0 - avg_time 0.081222 ms 
22 - fw-sort-layer 36 - type: 0 - avg_time 0.079672 ms 
23 - fw-sort-layer 8 - type: 9 - avg_time 0.076483 ms 
24 - fw-sort-layer 3 - type: 9 - avg_time 0.068887 ms 
25 - fw-sort-layer 30 - type: 28 - avg_time 0.067604 ms 
26 - fw-sort-layer 17 - type: 3 - avg_time 0.063098 ms 
27 - fw-sort-layer 37 - type: 28 - avg_time 0.059801 ms 
28 - fw-sort-layer 9 - type: 3 - avg_time 0.058222 ms 
29 - fw-sort-layer 16 - type: 9 - avg_time 0.052207 ms 
30 - fw-sort-layer 14 - type: 9 - avg_time 0.041681 ms 
31 - fw-sort-layer 25 - type: 3 - avg_time 0.040754 ms 
32 - fw-sort-layer 24 - type: 9 - avg_time 0.039432 ms 
33 - fw-sort-layer 22 - type: 9 - avg_time 0.039396 ms 
34 - fw-sort-layer 6 - type: 9 - avg_time 0.036083 ms 
35 - fw-sort-layer 33 - type: 33 - avg_time 0.035134 ms 
36 - fw-sort-layer 34 - type: 9 - avg_time 0.030790 ms 
37 - fw-sort-layer 31 - type: 9 - avg_time 0.027521 ms
Predicted in 5.877000 milli-seconds

./darknet detector test :

Sorted by time (forward):
0 - fw-sort-layer 0 - type: 0 - avg_time 229.162994 ms 
1 - fw-sort-layer 26 - type: 0 - avg_time 0.444000 ms 
2 - fw-sort-layer 35 - type: 0 - avg_time 0.413000 ms 
3 - fw-sort-layer 28 - type: 0 - avg_time 0.383000 ms 
4 - fw-sort-layer 2 - type: 0 - avg_time 0.332000 ms 
5 - fw-sort-layer 18 - type: 0 - avg_time 0.295000 ms 
6 - fw-sort-layer 10 - type: 0 - avg_time 0.255000 ms 
7 - fw-sort-layer 4 - type: 0 - avg_time 0.231000 ms 
8 - fw-sort-layer 1 - type: 0 - avg_time 0.211000 ms 
9 - fw-sort-layer 20 - type: 0 - avg_time 0.167000 ms 
10 - fw-sort-layer 21 - type: 0 - avg_time 0.165000 ms 
11 - fw-sort-layer 5 - type: 0 - avg_time 0.159000 ms 
12 - fw-sort-layer 12 - type: 0 - avg_time 0.116000 ms 
13 - fw-sort-layer 13 - type: 0 - avg_time 0.116000 ms 
14 - fw-sort-layer 7 - type: 0 - avg_time 0.096000 ms 
15 - fw-sort-layer 23 - type: 0 - avg_time 0.095000 ms 
16 - fw-sort-layer 27 - type: 0 - avg_time 0.091000 ms 
17 - fw-sort-layer 15 - type: 0 - avg_time 0.088000 ms 
18 - fw-sort-layer 29 - type: 0 - avg_time 0.084000 ms 
19 - fw-sort-layer 32 - type: 0 - avg_time 0.075000 ms 
20 - fw-sort-layer 36 - type: 0 - avg_time 0.074000 ms 
21 - fw-sort-layer 8 - type: 9 - avg_time 0.061000 ms 
22 - fw-sort-layer 9 - type: 3 - avg_time 0.049000 ms 
23 - fw-sort-layer 30 - type: 28 - avg_time 0.045000 ms 
24 - fw-sort-layer 37 - type: 28 - avg_time 0.043000 ms 
25 - fw-sort-layer 3 - type: 9 - avg_time 0.029000 ms 
26 - fw-sort-layer 22 - type: 9 - avg_time 0.028000 ms 
27 - fw-sort-layer 6 - type: 9 - avg_time 0.027000 ms 
28 - fw-sort-layer 24 - type: 9 - avg_time 0.027000 ms 
29 - fw-sort-layer 25 - type: 3 - avg_time 0.024000 ms 
30 - fw-sort-layer 33 - type: 33 - avg_time 0.024000 ms 
31 - fw-sort-layer 14 - type: 9 - avg_time 0.023000 ms 
32 - fw-sort-layer 16 - type: 9 - avg_time 0.023000 ms 
33 - fw-sort-layer 17 - type: 3 - avg_time 0.023000 ms 
34 - fw-sort-layer 19 - type: 9 - avg_time 0.023000 ms 
35 - fw-sort-layer 34 - type: 9 - avg_time 0.022000 ms 
36 - fw-sort-layer 11 - type: 9 - avg_time 0.019000 ms 
37 - fw-sort-layer 31 - type: 9 - avg_time 0.019000 ms 
Predicted in 234.141000 milli-seconds.
josephT1962 commented 3 years ago

I am having the same issue. When I detect single images, It took around 200 ms. But when I ran demo, It has ~30 fps. Did you find out the reason and fix it?