pcb9382 / FaceAlgorithm

face detection face recognition包含人脸检测(retinaface,yolov5face,yolov7face,yolov8face),人脸检测跟踪(ByteTracker),人脸角度计算(Face_Angle)人脸矫正(Face_Aligner),人脸识别(Arcface),口罩检测(MaskRecognitiion),年龄性别检测(Gender_age),静默活体检测(Silent_Face_Anti_Spoofing),FaceAlignment(106keypoints)
MIT License
295 stars 65 forks source link

cudaMemcpyDeviceToHost Slow Time #3

Open SwEngine opened 1 year ago

SwEngine commented 1 year ago

When I provide images in a loop without any delay, the processing time for yolov7-face or yolov8-face is short. However, when I feed the images to the detection function one by one, introducing a 1-second time interval between each photo, the processing time becomes longer. What might be causing this issue?

Here are the processing times for images in a loop:

../images//test1.jpg average: 389.05ms
../images//test3.jpg average: 134.054ms
../images//cam4.jpg average: 104.824ms
../images//test11.jpg average: 93.1855ms
../images//test7.jpg average: 86.4966ms
../images//test8.jpg average: 85.9823ms
../images//arac2.jpg average: 67.5789ms
../images//arac3.jpg average: 69.3688ms
../images//arac4.jpg average: 68.7759ms
../images//test9.jpg average: 75.8391ms

And here are the processing times with 1-second intervals between images:

../images//test1.jpg average: 267.529ms
../images//test3.jpg average: 313.996ms
../images//cam4.jpg average: 159.6ms
../images//test11.jpg average: 315.25ms
../images//test7.jpg average: 296.985ms
../images//test8.jpg average: 237.869ms
../images//arac2.jpg average: 206.976ms
../images//arac3.jpg average: 244.924ms
../images//arac4.jpg average: 185.883ms
../images//test9.jpg average: 239.323ms

Upon analyzing the detect function, I've identified that the following line is taking a long time: CHECK(cudaMemcpyAsync(decode_ptr_host[i],decode_ptr_device,sizeof(float)(1+MAX_OBJECTSNUM_BOX_ELEMENT),cudaMemcpyDeviceToHost,stream));

What could be the issue and what can be the solution? CudaMemCpy is slower when images are given one by one. How can I solve this?

pcb9382 commented 1 year ago

It should be an interval of 1s, causing the GPU frequency to drop

  1. Try to lock the GPU frequency (recommended)
  2. Open a kernel function to keep running and maintain the frequency of the GPU
  3. Change the GPU driver I haven't tried the above method, you can try it, let me know if it solves the problem, and let me know if there is a better way
SwEngine commented 1 year ago

I tried your first recommendation, and it works! Thank you!

For who want to try to lock the gpu frequency of jetson devices, commands are given below: $ sudo nvpmodel -m 0 $ sudo jetson_clocks

pcb9382 commented 1 year ago

Oh, that's great