Closed adamwhats closed 1 year ago
Hi @adamwhats thanks for the comment. The easiest way to fix it may be to resize the points array to the image size. However, the use of the depth image is interesting. I have never used it, can it speed up the 3D bounding box creation?
Thanks for getting back to me. I don't think resizing the pointcloud will work for me unfortunately :confused: The realsense seems to only publish valid points, meaning that the pointcloud message size is constantly changing so any attempt to resize to the image size will error.
I've written a version of the convert_bb_to_3d
which works by first segmenting the depth image and then using the depth camera_info message to project it out to 3D. I could probably optimize it a little more, but it currently takes about 11/12ms on my PC. Unfortunately I can't compare that with your original method due to this issue with the camera, however if you share a rosbag with me from your camera then I'd be happy to do a bit more testing to see which approach is faster?
Out of interest, which camera have you been using?
I am using an Asus Xtion which is the camera inside the robots in our lab( TIAGo and RB1). I think I have met the same problem with the realsense before but I don't remember how I fixed it. Maybe changing some parameters in the launch file. I'll try to record some bags this week. You may use a simulated robot. I think the RGBD camera should behave like the Asus I am using.
Here you have a bag. It is the one I have used in the videos of the README.
Appreciate you sharing that, but unfortunately there isn't a pointcloud topic in that one so I can't do a side-by-side comparison of the methods :confused:
I just have a small question about the max_detection_threshold parameter to make sure I'm implementing things correctly. Is the expected behavior that any detection's beyond the threshold do not have a 3d bounding box?
The maximum_detection_threshold is the maximum distance, in the z-axis from the center of the 3D object, to be included in the 3D bounding box.
On the other hand, you can create the point cloud using depth_image_proc with the depth image. Here is the launch file I used:
import launch
import launch_ros.actions
import launch_ros.descriptions
def generate_launch_description():
container = launch_ros.actions.ComposableNodeContainer(
name='container',
namespace=namespace,
package='rclcpp_components',
executable='component_container',
composable_node_descriptions=[
# Create XYZRGB point cloud
launch_ros.descriptions.ComposableNode(
package='depth_image_proc',
plugin='depth_image_proc::PointCloudXyzrgbNode',
name='points_xyzrgb',
parameters=[{'queue_size': 10}],
remappings=[('rgb/image_rect_color', '/xtion/rgb/image_raw'),
('rgb/camera_info', '/xtion/rgb/camera_info'),
('depth_registered/image_rect',
'/xtion/depth/image_raw'),
('points', '/xtion/depth_registered/points'), ],
),
],
output='screen',
)
return launch.LaunchDescription([container])
Thanks a lot for that. I've done a bit of testing and it looks like my current implementation is about 3x slower :sweat_smile: I've got a few ideas for improving it which I'll try work on this evening
Ok, I've had a chance to make some changes and it now runs at a comparable speed to your original approach, here's my results:
It's not the most sophisticated benchmarking, I've simply ran your rosbag three times each time while timing how long the on_detect
callback in detect_3d_node.py
took.
Would you like me to put together a PR? There's not much improvement in terms of performance, but I think by using the depth image rather than the pointcloud is more robust to different models of camera. No worries if not though
Thanks for the work. I have a couple of questions: Have you tried this with the normal model of yolov8m? Would you happen to know why your version is slower with the yolov8_pose?
Apologies for the slow reply. In answer to your questions:
detect_3d_node
functions in the same way for both yolov8m and yolov8m-seg - as you can see below this is why there's pretty much the same percentage improvementSo when there is keypoint data, 'detect_3d_node' runs both the 'convert_bb_to_3dand
convert_keypoints_to_3dfunctions. The new implementation of 'convert_bb_to_3d
is faster than the original, but the new convert_keypoints_to_3d
is slower. This is because in your original implementation you are able to simply look up the 3D point using the keypoint coordinates:
for p in detection.keypoints.data:
if int(p.point.y) >= points.shape[0] or int(p.point.x) >= points.shape[1]:
continue
p3d = points[int(p.point.y)][int(p.point.x)]
whereas in mine I have to do the extra maths of projecting to 3D:
# Build an array of 2d keypoints
keypoints_2d = np.array([[p.point.x, p.point.y] for p in detection.keypoints.data], dtype=np.int16)
u = np.array(keypoints_2d[:, 1]).clip(0, depth_info.height - 1)
v = np.array(keypoints_2d[:, 0]).clip(0, depth_info.width - 1)
z = depth_image[u, v] k = depth_info.k px, py, fx, fy = k[2], k[5], k[0], k[4] x = z (v - px) / fx y = z (u - py) / fy points_3d = np.dstack([x, y, z]).reshape(-1, 3) / self.depth_image_units_divisor # convert to meters
All in all, the slowdown in `convert_keypoints_to_3d` outweighs the improvement in `convert_bb_to_3d`
Great work and thanks for the comparison. Can you make a PR with the changes? Remember to remove logs and the time calculus.
Hi there,
Firstly thanks for the great repo.
I'm having some issues while trying to use the
detect_3d_node.py
where theconvert_bb_to_3d
function always gives me an IndexError.Looking at line
center_point = points[int(center_y)][int(center_x)]
and also line 83 where thepoints
array is constructed, it seems like it expects thepoint_msg.height
andpoint_msg.width
to correspond to the input image size used in the detection node. Unfortunately the depth camera/ ros2 driver I'm using (Realsense D435) constructs the PointCloud2 message with a height of 1, meaning thepoints
array ends up with a shape of[1, num_points, 3]
.I'm happy to work on a PR to fix this if you would like? I feel that assuming the PointCloud2 message width and height matches those of the colour image isn't the most robust solution. One approach I've used in the past has been to calculate the 3D bounding box by applying the segmentation mask to the depth image rather than the pointcloud, but I'm also happy to try something else if you have any suggestions.
Adam