stevenjj / openpose_ros

A ros wrapper for the CMU openpose library
60 stars 25 forks source link

Very slow running speed #7

Open erolley-parnell opened 6 years ago

erolley-parnell commented 6 years ago

Hi,

I am finding that the run speed of the wrapper is incredibly slow. I am using a Titan X and I have around 8fps with the openpose code on its own, but the wrapper drops the speed to 4.2fps with the 2D and maybe 1-2fps with the 3D skeleton without the visualisation.

Have you got any suggestions of any changes to make, to flags or otherwise, so that I can increase the frame rate?

stevenjj commented 6 years ago

Hi, I apologize for not replying to this immediately.

There's certainly ways to increase the frame rate, but it involves changing how we extract and process the 3D points from the image. I believe the biggest culprit is the unoptimized code and the use of the depth registered point cloud message instead of processing on the 2D depth image.

Currently, we extract the indices of a 5x5 pixel square from the RGB image and use those indices on the depth registered point cloud message, then perform averaging (ideally we should be doing a mean shift) to figure out which points to keep and throw away. However, a faster method involves using the indices of the 2D depth-image instead, perform a faster index rejection scheme, and calculate the 3D points using the camera's intrinsic parameters.

This is certainly future work and is not too difficult to do. I believe that soon a better version of the 3D skeleton extractor module will be written either by me or someone else.

Ivy-Tang commented 6 years ago

Hi, same question. I am using a Titan XP and I have around 9fps when running OpenPose alone. However, when I use rosbag to record the topic "/openpose_ros/skeleton_3d/detected_poses_keypoints_3d" and look into the rosbag, I find that the frequency that the 3d detections are sent is around 3fps. As shown below: 2018-05-24 17_43_07 Is there anything that I am missing? Because I find the processing rate is around 7fps in your technical report with a GTX1080.

stevenjj commented 6 years ago

I believe both of you are not missing anything. The 9fps with OpenPose alone is expected. In theory, the 3D extraction module is performing a very simple operation of averaging the 3D locations of each key point from small window extracted from the RGB image, so there should be no significant slow downs. At most, the time complexity should be O(n^2).

However, I admit that the specific implementation of that node was not optimized very well. I wrote the initial algorithm with using a mean-shift clustering in mind, and @MiguelARD implemented with an averaging approach and wrote a technical report on it. The averaging approach should have been slightly faster. A 3fps output of the 3d detections, is a clear indication that the 3d detection node is not performing as it was intended. Perhaps @MiguelARD can give a better explanation of the algorithmic slow down.

If I find the time in the future, I will fix this slow down with a re-implementation of the 3d detection pipeline, as the 3D detection component is not supposed to be computationally expensive. I hope this answers your question.

Also, if you are interested in tackling the re-implementation component, you are welcome to do so, and I will be here to point you in the right direction and help you with the debugging process.

Ivy-Tang commented 6 years ago

Hi, thank you for your quick response! When I look into the time each part takes, I find that it is the pcl::fromROSMsg and pcl::getMinMax3D function which consumes a lot of time. It seems that PCL calculation works quite slow in ROS, I have configured my PC to use Titan Xp with PCL, but it doesn't work. Is there anything I can do before re-implement the 3d detection pipeline? Thank you in advance.

MiguelARD commented 6 years ago

I think you are not missing anything. In fact you are obtaining better results than me. In the figure 44 of the tech report is the frequency of the 3d detections I had, that was about 2fps.

captura de pantalla 2018-05-25 a las 14 17 53

My objective was just to implement a functional tool. I did not have a lot of experience in coding so I think it might be very optimizable. As @stevenjj told you, feel free to optimize the algorithm! It would be fantastic.

I hope the code is being useful for you, thank you very much for using it!

stevenjj commented 6 years ago

@MiguelARD : Thanks for replying. :)

@Ivy-Tang : That's a good find. So first, I believe we can do all of our processing with the existing ros msgs, without converting it to a pcl data format. Otherwise, we are performing multiple copy operations, and since the point cloud has hundreds of thousands of points, doing copy operations more than once will significantly slow it down.

See for example point cloud iterators in sensor msgs format: http://docs.ros.org/api/sensor_msgs/html/annotated.html

Second, I believe finding the min and max of each coordinate is not necessary for the algorithm to work. As a quick try, hardcoding some min and max values and removing the min max function will immediately tell us if this is needed.

What do you mean when you said this?

I have configured my PC to use Titan Xp with PCL, but it doesn't work.

angelbeibei commented 6 years ago

@erolley-parnell Can you compile successfully? Thanks!

erolley-parnell commented 6 years ago

Yeah I compiled it successfully. I ended up making my own wrapper in the end for 3D, but it means I'm really familiar with the openpose system now, especially the later versions as some of the code had to be updated. If you are struggling to compile, open a new issue so it can be looked at directly :)