weigao95 / surfelwarp

SurfelWarp: Efficient Non-Volumetric Dynamic Reconstruction
https://sites.google.com/view/surfelwarp/home
BSD 3-Clause "New" or "Revised" License
276 stars 71 forks source link

Ubuntu 50 ms per frame #39

Closed xiaotaw closed 4 years ago

xiaotaw commented 4 years ago
  1. Build with Release and disable offline rendering
  2. Run surfelwarp_app on ubuntu with GPU
  3. The average time is 55 ms (add nvcc --use_fast_math, 45 ms)

Any other ways to increase fps?

weigao95 commented 4 years ago
  1. Try windows. Although I don't know why, in my developing the code runs much faster on it.
  2. What GPU are you using? Can you try a better one?
  3. If you want to invest time, try implement thread-level parallelism for rendering, solver and geometric updater. The code was architected to support it, but not implemented yet.
xiaotaw commented 4 years ago

I only have a ubuntu PC with one GTX 1080 Ti card.

Before implementing thread-level parallelism, I try to find out the most time-consuming steps.

Additional Info: I get some timestamps in function ProcessNextFrameWithReinit, and the durations of some key steps are listed here:

    Duration/us      Time/s
/data/surfelwarp/apps/surfelwarp_app/main.cpp:37: The 210th Frame   
        11           9.68129   start to process next frame
        38           9.68132   Check the frame and draw
        26           9.68135   Map to solver maps
     30739           9.71209   Process the next depth frame
       296           9.71239   First perform rigid solver
         4           9.71239   The resource from geometry attributes
      7728           9.72012   warp_solver->solve
       139           9.72026   Something after warp_solver->solve
      4141            9.7244   integrate
        33           9.72443   unmapping
/data/surfelwarp/apps/surfelwarp_app/main.cpp:37: The 211th Frame
         6           9.72444   start to process next frame
        53           9.72449   Check the frame and draw
        31           9.72452   Map to solver maps
     30016           9.75454   Process the next depth frame
       296           9.75484   First perform rigid solver
         5           9.75484   The resource from geometry attributes
      7696           9.76254   warp_solver->solve
       170           9.76271   Something after warp_solver->solve
      4141           9.76685   integrate
        36           9.76689   unmapping

ImageProcesser::ProcessFrameStreamed takes 30 ms, far more than expected.

Further investigation shows:

  1. cv::imread takes about 5 ms to read ONE image from file, thus three images consume 15 ms. If using another thread for image fectching, this time will be saved.

  2. FindCorrespondence takes another 15 ms, inside of it, buildColliderKeyValueKernel accounts for 14ms. Each call of it, the computation on previous rgb image seems redundant. Could we just save the results of previous rgb image on GPU ?

weigao95 commented 4 years ago

It is feasible, but you need to take some care as the image processing for different frame coupled.