linghu8812 / tensorrt_inference

696 stars 205 forks source link

test result on arcface #40

Open bruceche11 opened 3 years ago

bruceche11 commented 3 years ago

I tried the arcface and everything went well. I tried twice: onnx2trt arcface_r100.onnx -o arcface_r100_fp12_engine_w5.trt -d 12 -w 5368709120 ./arcface_trt ./config.yaml ./samples and get : Processing: ./samples/test1.jpg prepareImage prepare image take: 0.392491 ms. host2device execute Inference take: 6.94628 ms. execute success device2host post process Post process take: 0.09717 ms. Processing: ./samples/test10.jpg prepareImage prepare image take: 0.278884 ms. host2device execute Inference take: 5.0688 ms. execute success device2host post process Post process take: 0.04981 ms. Processing: ./samples/test2.jpg prepareImage prepare image take: 0.279198 ms. host2device execute Inference take: 5.05902 ms. execute success device2host post process Post process take: 0.049924 ms. Processing: ./samples/test3.jpg prepareImage prepare image take: 0.277865 ms. host2device execute Inference take: 5.05516 ms. execute success device2host post process Post process take: 0.098275 ms. Processing: ./samples/test4.jpg prepareImage prepare image take: 0.351051 ms. host2device execute Inference take: 5.1283 ms. execute success device2host post process Post process take: 0.065136 ms. Processing: ./samples/test5.jpg prepareImage prepare image take: 0.370513 ms. host2device execute Inference take: 5.09149 ms. execute success device2host post process Post process take: 0.04263 ms. Processing: ./samples/test6.jpg prepareImage prepare image take: 0.304695 ms. host2device execute Inference take: 5.06743 ms. execute success device2host post process Post process take: 0.052134 ms. Processing: ./samples/test7.jpg prepareImage prepare image take: 0.355103 ms. host2device execute Inference take: 5.08416 ms. execute success device2host post process Post process take: 0.054823 ms. Processing: ./samples/test8.jpg prepareImage prepare image take: 0.291755 ms. host2device execute Inference take: 5.06323 ms. execute success device2host post process Post process take: 0.047252 ms. Processing: ./samples/test9.jpg prepareImage prepare image take: 0.288848 ms. host2device execute Inference take: 5.04114 ms. execute success device2host post process Post process take: 0.040619 ms. Average processing time is 5.63932ms The similarity matrix of the image folder is: [1, 0.51497477, 0.83092833, 0.44836619, 0.44409686, 0.44004413, 0.57703531, 0.48046044, 0.50348091, 0.52596587; 0.51497477, 1, 0.5157097, 0.49093315, 0.48639575, 0.55684233, 0.41457996, 0.4557389, 0.45707369, 0.51120299; 0.83092833, 0.5157097, 1, 0.43384337, 0.43466485, 0.44116706, 0.55737579, 0.49809921, 0.50180018, 0.52988255; 0.44836619, 0.49093315, 0.43384337, 1, 0.8184306, 0.52917022, 0.44513768, 0.51536781, 0.50124043, 0.56127048; 0.44409686, 0.48639575, 0.43466485, 0.8184306, 1, 0.53311759, 0.48287207, 0.50482482, 0.52335793, 0.49513683; 0.44004413, 0.55684233, 0.44116706, 0.52917022, 0.53311759, 1, 0.46499243, 0.51840144, 0.4833495, 0.43685332; 0.57703531, 0.41457996, 0.55737579, 0.44513768, 0.48287207, 0.46499243, 1, 0.53517133, 0.51514965, 0.48933336; 0.48046044, 0.4557389, 0.49809921, 0.51536781, 0.50482482, 0.51840144, 0.53517133, 1, 0.4795776, 0.47983229; 0.50348091, 0.45707369, 0.50180018, 0.50124043, 0.52335793, 0.4833495, 0.51514965, 0.4795776, 1, 0.51290995; 0.52596587, 0.51120299, 0.52988255, 0.56127048, 0.49513683, 0.43685332, 0.48933336, 0.47983229, 0.51290995, 1]! means the inference time is about 5ms

and second time, I change fp12 to fp32, the inference time is about 7.8ms

I tried at P100

Three question: 1、Is the result right? 2、how to choose only 1 gpu as I have 4 gpus in one computer. 3、anything to do to make the inference faster? My goal is 4ms

linghu8812 commented 3 years ago
  1. they are the similiarity between the images in arcface/samples.
result image image image image image image image image image image
image 1 0.51497477 0.83092833 0.44836619 0.44409686 0.44004413 0.57703531 0.48046044 0.50348091 0.52596587
image 0.51497477 1 0.5157097 0.49093315 0.48639575 0.55684233 0.41457996 0.4557389 0.45707369 0.51120299
image 0.83092833 0.5157097 1 0.43384337 0.43466485 0.44116706 0.55737579 0.49809921 0.50180018 0.52988255
image 0.44836619 0.49093315 0.43384337 1 0.8184306 0.52917022 0.44513768 0.51536781 0.50124043 0.56127048
image 0.44409686 0.48639575 0.43466485 0.8184306 1 0.53311759 0.48287207 0.50482482 0.52335793 0.49513683
image 0.44004413 0.55684233 0.44116706 0.52917022 0.53311759 1 0.46499243 0.51840144 0.4833495 0.43685332
image 0.57703531 0.41457996 0.55737579 0.44513768 0.48287207 0.46499243 1 0.53517133 0.51514965 0.48933336
image 0.48046044 0.4557389 0.49809921 0.51536781 0.50482482 0.51840144 0.53517133 1 0.4795776 0.47983229
image 0.50348091 0.45707369 0.50180018 0.50124043 0.52335793 0.4833495 0.51514965 0.4795776 1 0.51290995
image 0.52596587 0.51120299 0.52988255 0.56127048 0.49513683 0.43685332 0.48933336 0.47983229 0.51290995 1
  1. use CUDA_VISIBLE_DEVICES=N ./arcface_trt ../config.yaml ../samples to choose the Nth GPU.
  2. Try with large batch size.
bruceche11 commented 3 years ago
  1. they are the similiarity between the images in arcface/samples.

result image image image image image image image image image image image 1 0.51497477 0.83092833 0.44836619 0.44409686 0.44004413 0.57703531 0.48046044 0.50348091 0.52596587 image 0.51497477 1 0.5157097 0.49093315 0.48639575 0.55684233 0.41457996 0.4557389 0.45707369 0.51120299 image 0.83092833 0.5157097 1 0.43384337 0.43466485 0.44116706 0.55737579 0.49809921 0.50180018 0.52988255 image 0.44836619 0.49093315 0.43384337 1 0.8184306 0.52917022 0.44513768 0.51536781 0.50124043 0.56127048 image 0.44409686 0.48639575 0.43466485 0.8184306 1 0.53311759 0.48287207 0.50482482 0.52335793 0.49513683 image 0.44004413 0.55684233 0.44116706 0.52917022 0.53311759 1 0.46499243 0.51840144 0.4833495 0.43685332 image 0.57703531 0.41457996 0.55737579 0.44513768 0.48287207 0.46499243 1 0.53517133 0.51514965 0.48933336 image 0.48046044 0.4557389 0.49809921 0.51536781 0.50482482 0.51840144 0.53517133 1 0.4795776 0.47983229 image 0.50348091 0.45707369 0.50180018 0.50124043 0.52335793 0.4833495 0.51514965 0.4795776 1 0.51290995 image 0.52596587 0.51120299 0.52988255 0.56127048 0.49513683 0.43685332 0.48933336 0.47983229 0.51290995 1

  1. use CUDA_VISIBLE_DEVICES=N ./arcface_trt ../config.yaml ../samples to choose the Nth GPU.
  2. Try with large batch size.

Thanks!!! I try as you said and everything goes well. I change the BATCH_SIZE in the config.yaml and found that when BATCH_SIZE=2, the inference time is about 5ms; when BATCH_SIZE=5, the inference time is still about 5ms. But, the similarity become low. why? root@1be4eb231b21:/workspace/ws_onnx_trt/arcface_trt# CUDA_VISIBLE_DEVICES=3 ./build/arcface_trt ./config.yaml ./samples loading filename from:../arcface_r100_fp16_engine_w5.trt deserialize done binding0: 150528 binding1: 2048 Processing: ./samples/test1.jpg Processing: ./samples/test10.jpg prepareImage prepare image take: 0.603685 ms. host2device execute Inference take: 6.87112 ms. execute success device2host post process size of feature: 2 x 256 size of feature: 256 size of feature: 2 size of feature: 2 Post process take: 0.116953 ms. Processing: ./samples/test2.jpg Processing: ./samples/test3.jpg prepareImage prepare image take: 0.507378 ms. host2device execute Inference take: 5.07406 ms. execute success device2host post process size of feature: 2 x 256 size of feature: 256 size of feature: 2 size of feature: 2 Post process take: 0.078156 ms. Processing: ./samples/test4.jpg Processing: ./samples/test5.jpg prepareImage prepare image take: 0.493243 ms. host2device execute Inference take: 5.05707 ms. execute success device2host post process size of feature: 2 x 256 size of feature: 256 size of feature: 2 size of feature: 2 Post process take: 0.074733 ms. Processing: ./samples/test6.jpg Processing: ./samples/test7.jpg prepareImage prepare image take: 0.544818 ms. host2device execute Inference take: 5.05152 ms. execute success device2host post process size of feature: 2 x 256 size of feature: 256 size of feature: 2 size of feature: 2 Post process take: 0.074367 ms. Processing: ./samples/test8.jpg Processing: ./samples/test9.jpg prepareImage prepare image take: 0.536113 ms. host2device execute Inference take: 5.0505 ms. execute success device2host post process size of feature: 2 x 256 size of feature: 256 size of feature: 2 size of feature: 2 Post process take: 0.094825 ms. Average processing time is 3.02285ms The similarity matrix of the image folder is: [1, 0.46567917, 0.8410123, 0.57973731, 0.41083992, 0.43045956, 0.57734334, 0.48706296, 0.53349322, 0.49332231; 0.46567917, 1, 0.45773649, 0.8202728, 0.52905834, 0.48151776, 0.49739775, 0.57670194, 0.51964402, 0.46951485; 0.8410123, 0.45773649, 1, 0.5624944, 0.44073945, 0.44553632, 0.55419302, 0.49897903, 0.50571543, 0.50760627; 0.57973731, 0.8202728, 0.5624944, 1, 0.47773665, 0.4283652, 0.53124946, 0.56083965, 0.54898351, 0.49772584; 0.41083992, 0.52905834, 0.44073945, 0.47773665, 1, 0.53569794, 0.48075521, 0.51989633, 0.4904862, 0.48234069; 0.43045956, 0.48151776, 0.44553632, 0.4283652, 0.53569794, 1, 0.47050983, 0.48516709, 0.52618235, 0.55786932; 0.57734334, 0.49739775, 0.55419302, 0.53124946, 0.48075521, 0.47050983, 1, 0.48115456, 0.50397408, 0.49293461; 0.48706296, 0.57670194, 0.49897903, 0.56083965, 0.51989633, 0.48516709, 0.48115456, 1, 0.50318199, 0.52735072; 0.53349322, 0.51964402, 0.50571543, 0.54898351, 0.4904862, 0.52618235, 0.50397408, 0.50318199, 1, 0.48803231; 0.49332231, 0.46951485, 0.50760627, 0.49772584, 0.48234069, 0.55786932, 0.49293461, 0.52735072, 0.48803231, 1]! root@1be4eb231b21:/workspace/ws_onnx_trt/arcface_trt# vim config.yaml root@1be4eb231b21:/workspace/ws_onnx_trt/arcface_trt# CUDA_VISIBLE_DEVICES=3 ./build/arcface_trt ./config.yaml ./samples loading filename from:../arcface_r100_fp16_engine_w5.trt deserialize done binding0: 150528 binding1: 2048 Processing: ./samples/test1.jpg Processing: ./samples/test10.jpg Processing: ./samples/test2.jpg Processing: ./samples/test3.jpg Processing: ./samples/test4.jpg prepareImage prepare image take: 2.89447 ms. host2device execute Inference take: 7.00111 ms. execute success device2host post process size of feature: 5 x 102 size of feature: 102 size of feature: 5 size of feature: 2 Post process take: 0.152602 ms. Processing: ./samples/test5.jpg Processing: ./samples/test6.jpg Processing: ./samples/test7.jpg Processing: ./samples/test8.jpg Processing: ./samples/test9.jpg prepareImage prepare image take: 1.34157 ms. host2device execute Inference take: 5.04295 ms. execute success device2host post process size of feature: 5 x 102 size of feature: 102 size of feature: 5 size of feature: 2 Post process take: 0.075636 ms. Average processing time is 1.65083ms The similarity matrix of the image folder is: [1, 0.50479847, 0.64849001, 0.49709004, 0.56888652, 0.52432233, 0.6012755, 0.49516344, 0.50592017, 0.53921306; 0.50479847, 1, 0.49245936, 0.42679471, 0.40797403, 0.37852359, 0.37921709, 0.52433467, 0.54978251, 0.51545572; 0.64849001, 0.49245936, 1, 0.44285911, 0.54573059, 0.5089705, 0.54890561, 0.46028119, 0.56183159, 0.64145476; 0.49709004, 0.42679471, 0.44285911, 1, 0.48192114, 0.49149111, 0.42734671, 0.46144605, 0.4431079, 0.53690982; 0.56888652, 0.40797403, 0.54573059, 0.48192114, 1, 0.58605212, 0.53718555, 0.47716492, 0.49725425, 0.40605214; 0.52432233, 0.37852359, 0.5089705, 0.49149111, 0.58605212, 1, 0.52105993, 0.51957816, 0.48044747, 0.57446051; 0.6012755, 0.37921709, 0.54890561, 0.42734671, 0.53718555, 0.52105993, 1, 0.48025808, 0.38598213, 0.47998843; 0.49516344, 0.52433467, 0.46028119, 0.46144605, 0.47716492, 0.51957816, 0.48025808, 1, 0.48002502, 0.44741845; 0.50592017, 0.54978251, 0.56183159, 0.4431079, 0.49725425, 0.48044747, 0.38598213, 0.48002502, 1, 0.54007053; 0.53921306, 0.51545572, 0.64145476, 0.53690982, 0.40605214, 0.57446051, 0.47998843, 0.44741845, 0.54007053, 1]!

xinlin-xiao commented 1 month ago
  1. they are the similiarity between the images in arcface/samples.

result image image image image image image image image image image image 1 0.51497477 0.83092833 0.44836619 0.44409686 0.44004413 0.57703531 0.48046044 0.50348091 0.52596587 image 0.51497477 1 0.5157097 0.49093315 0.48639575 0.55684233 0.41457996 0.4557389 0.45707369 0.51120299 image 0.83092833 0.5157097 1 0.43384337 0.43466485 0.44116706 0.55737579 0.49809921 0.50180018 0.52988255 image 0.44836619 0.49093315 0.43384337 1 0.8184306 0.52917022 0.44513768 0.51536781 0.50124043 0.56127048 image 0.44409686 0.48639575 0.43466485 0.8184306 1 0.53311759 0.48287207 0.50482482 0.52335793 0.49513683 image 0.44004413 0.55684233 0.44116706 0.52917022 0.53311759 1 0.46499243 0.51840144 0.4833495 0.43685332 image 0.57703531 0.41457996 0.55737579 0.44513768 0.48287207 0.46499243 1 0.53517133 0.51514965 0.48933336 image 0.48046044 0.4557389 0.49809921 0.51536781 0.50482482 0.51840144 0.53517133 1 0.4795776 0.47983229 image 0.50348091 0.45707369 0.50180018 0.50124043 0.52335793 0.4833495 0.51514965 0.4795776 1 0.51290995 image 0.52596587 0.51120299 0.52988255 0.56127048 0.49513683 0.43685332 0.48933336 0.47983229 0.51290995 1

  1. use to choose the Nth GPU.CUDA_VISIBLE_DEVICES=N ./arcface_trt ../config.yaml ../samples
  2. Try with large batch size.

The code how to show similarity matrix result like it image