Closed xiaotaw closed 4 years ago
I only have a ubuntu PC with one GTX 1080 Ti card.
Before implementing thread-level parallelism, I try to find out the most time-consuming steps.
Additional Info:
I get some timestamps in function ProcessNextFrameWithReinit
, and the durations of some key steps are listed here:
Duration/us Time/s /data/surfelwarp/apps/surfelwarp_app/main.cpp:37: The 210th Frame 11 9.68129 start to process next frame 38 9.68132 Check the frame and draw 26 9.68135 Map to solver maps 30739 9.71209 Process the next depth frame 296 9.71239 First perform rigid solver 4 9.71239 The resource from geometry attributes 7728 9.72012 warp_solver->solve 139 9.72026 Something after warp_solver->solve 4141 9.7244 integrate 33 9.72443 unmapping /data/surfelwarp/apps/surfelwarp_app/main.cpp:37: The 211th Frame 6 9.72444 start to process next frame 53 9.72449 Check the frame and draw 31 9.72452 Map to solver maps 30016 9.75454 Process the next depth frame 296 9.75484 First perform rigid solver 5 9.75484 The resource from geometry attributes 7696 9.76254 warp_solver->solve 170 9.76271 Something after warp_solver->solve 4141 9.76685 integrate 36 9.76689 unmapping
ImageProcesser::ProcessFrameStreamed
takes 30 ms, far more than expected.
Further investigation shows:
cv::imread takes about 5 ms to read ONE image from file, thus three images consume 15 ms. If using another thread for image fectching, this time will be saved.
FindCorrespondence takes another 15 ms, inside of it, buildColliderKeyValueKernel accounts for 14ms. Each call of it, the computation on previous rgb image seems redundant. Could we just save the results of previous rgb image on GPU ?
It is feasible, but you need to take some care as the image processing for different frame coupled.
Any other ways to increase fps?