rockchip-linux / rknn-toolkit

BSD 3-Clause "New" or "Revised" License
808 stars 173 forks source link

使用RV1126开发板,性能评估时和理论耗时差距较大 #388

Closed ly19940318 closed 1 year ago

ly19940318 commented 1 year ago

问题复现步骤: 1.运行examples/tflite/mobilenet_v1中的test.py 2.分别使用模拟器进行评估以及连接开发板评估,结果如下:

----------模拟器评估-----------------

--> config model done --> Loading model W The target_platform is not set in config, using default target platform rk1808. done --> Building model done --> Export RKNN model done --> Init runtime environment librknn_runtime version 1.7.3 (5047ff8 build: 2022-08-13 12:11:22 base: 1131) done --> Running model mobilenet_v1 -----TOP 5-----

done --> Evaluate model performance W When performing performance evaluation, inputs can be set to None to use fake inputs.

                           Performance                              

======================================================================== Layer ID Name Time(us) 60 openvx.tensor_transpose_3 72 1 convolution.relu.pooling.layer2_2 369 3 convolution.relu.pooling.layer2_2 211 5 convolution.relu.pooling.layer2_2 184 7 convolution.relu.pooling.layer2_2 315 9 convolution.relu.pooling.layer2_2 99 11 convolution.relu.pooling.layer2_2 137 13 convolution.relu.pooling.layer2_2 103 15 convolution.relu.pooling.layer2_2 116 17 convolution.relu.pooling.layer2_2 95 19 convolution.relu.pooling.layer2_2 102 21 convolution.relu.pooling.layer2_2 151 23 convolution.relu.pooling.layer2_2 95 25 convolution.relu.pooling.layer2_2 109 27 convolution.relu.pooling.layer2_2 106 29 convolution.relu.pooling.layer2_2 211 31 convolution.relu.pooling.layer2_2 106 33 convolution.relu.pooling.layer2_2 211 35 convolution.relu.pooling.layer2_2 106 37 convolution.relu.pooling.layer2_2 211 39 convolution.relu.pooling.layer2_2 106 41 convolution.relu.pooling.layer2_2 211 43 convolution.relu.pooling.layer2_2 106 45 convolution.relu.pooling.layer2_2 211 47 convolution.relu.pooling.layer2_2 108 49 convolution.relu.pooling.layer2_2 163 51 convolution.relu.pooling.layer2_2 206 53 convolution.relu.pooling.layer2_2 319 55 pooling.layer2 34 56 fullyconnected.relu.layer_3 110 58 softmaxlayer2.layer 39 Total Time(us): 4722 FPS(600MHz): 158.83 FPS(800MHz): 211.77 Note: Time of each layer is converted according to 800MHz!

done

---------开发板评估-----------

--> config model done --> Loading model done --> Building model done --> Export RKNN model done --> Init runtime environment W Flag perf_debug has been set, it will affect the performance of inference! I NPUTransfer: Starting NPU Transfer Client, Transfer version 2.1.0 (b5861e7@2020-11-23T11:50:36) D RKNNAPI: ============================================== D RKNNAPI: RKNN VERSION: D RKNNAPI: API: 1.7.3 (0cfd4a1 build: 2022-08-15 17:08:57) D RKNNAPI: DRV: 1.7.0 (7880361 build: 2021-08-16 14:05:08) D RKNNAPI: ============================================== done --> Running model mobilenet_v1 -----TOP 5-----

done --> Evaluate model performance W When performing performance evaluation, inputs can be set to None to use fake inputs.

                           Performance                              
    #### The performance result is just for debugging, ####
    #### may worse than actual performance!            ####

======================================================================== Layer ID Name Operator Uid Time(us) 0 MobilenetV1/MobilenetV1/Conv2d_0/Relu6_1 TENSOR_TRANS 60 361 _RKNN_mark_perm_60_0
2 MobilenetV1/MobilenetV1/Conv2d_0/Relu6_1 CONVOLUTION 1 920 _2
3 MobilenetV1/MobilenetV1/Conv2d_1_depthwi DEPTH_WISE_CONV 3 896 se/Relu6_3_2
4 MobilenetV1/MobilenetV1/Conv2d_1_pointwi CONVOLUTION 5 1106 se/Relu6_5_2
5 MobilenetV1/MobilenetV1/Conv2d_2_depthwi DEPTH_WISE_CONV 7 430 se/Relu6_7_2
6 MobilenetV1/MobilenetV1/Conv2d_2_pointwi CONVOLUTION 9 5080 se/Relu6_9_2
7 MobilenetV1/MobilenetV1/Conv2d_3_depthwi DEPTH_WISE_CONV 11 4888 se/Relu6_11_2
8 MobilenetV1/MobilenetV1/Conv2d_3_pointwi CONVOLUTION 13 5043 se/Relu6_13_2
9 MobilenetV1/MobilenetV1/Conv2d_4_depthwi DEPTH_WISE_CONV 15 148 se/Relu6_15_2
10 MobilenetV1/MobilenetV1/Conv2d_4_pointwi CONVOLUTION 17 243 se/Relu6_17_2
11 MobilenetV1/MobilenetV1/Conv2d_5_depthwi DEPTH_WISE_CONV 19 161 se/Relu6_19_2
12 MobilenetV1/MobilenetV1/Conv2d_5_pointwi CONVOLUTION 21 196 se/Relu6_21_2
13 MobilenetV1/MobilenetV1/Conv2d_6_depthwi DEPTH_WISE_CONV 23 74 se/Relu6_23_2
14 MobilenetV1/MobilenetV1/Conv2d_6_pointwi CONVOLUTION 25 109 se/Relu6_25_2
15 MobilenetV1/MobilenetV1/Conv2d_7_depthwi DEPTH_WISE_CONV 27 74 se/Relu6_27_2
16 MobilenetV1/MobilenetV1/Conv2d_7_pointwi CONVOLUTION 29 157 se/Relu6_29_2
17 MobilenetV1/MobilenetV1/Conv2d_8_depthwi DEPTH_WISE_CONV 31 69 se/Relu6_31_2
18 MobilenetV1/MobilenetV1/Conv2d_8_pointwi CONVOLUTION 33 160 se/Relu6_33_2
19 MobilenetV1/MobilenetV1/Conv2d_9_depthwi DEPTH_WISE_CONV 35 69 se/Relu6_35_2
20 MobilenetV1/MobilenetV1/Conv2d_9_pointwi CONVOLUTION 37 157 se/Relu6_37_2
21 MobilenetV1/MobilenetV1/Conv2d_10_depthw DEPTH_WISE_CONV 39 67 ise/Relu6_39_2
22 MobilenetV1/MobilenetV1/Conv2d_10_pointw CONVOLUTION 41 158 ise/Relu6_41_2
23 MobilenetV1/MobilenetV1/Conv2d_11_depthw DEPTH_WISE_CONV 43 72 ise/Relu6_43_2
24 MobilenetV1/MobilenetV1/Conv2d_11_pointw CONVOLUTION 45 159 ise/Relu6_45_2
25 MobilenetV1/MobilenetV1/Conv2d_12_depthw DEPTH_WISE_CONV 47 73 ise/Relu6_47_2
26 MobilenetV1/MobilenetV1/Conv2d_12_pointw CONVOLUTION 49 128 ise/Relu6_49_2
27 MobilenetV1/MobilenetV1/Conv2d_13_depthw DEPTH_WISE_CONV 51 69 ise/Relu6_51_2
28 MobilenetV1/MobilenetV1/Conv2d_13_pointw CONVOLUTION 53 209 ise/Relu6_53_2
29 MobilenetV1/Logits/AvgPool_1a/AvgPool_55 DEPTH_WISE_CONV 55 128 _2
30 MobilenetV1/Logits/Conv2d_1c_1x1/BiasAdd FULLYCONNECTED 56 477 _56_0
1 Softmax2Layer_1 SOFTMAX 945 Total Time(us): 22826 FPS: 43.81

done

相同的模型推理时间差距5倍左右 (后续测试yolov5s模型,在开发板上评估运行帧率4帧左右)

请问是什么原因导致?应该如何优化解决。