Acquiring the 3D pose, Z too large

martinlyra commented 1 year ago

Hello, thank you for this fantastic work!

My group have been able to get a good model with satisfactory results, We want to make an application of this model to a ROS system to acquire the world model pose. We do this by running Gen6D on a separate remote computer and the actual robot unit being by itself. To run it on ROS we've used a copy of predict.py, adopted to run as an independent ROS node subscribing to the robot's camera and then publishing the pose on its own topic.

The position is inspired by the discussion in #14:

camera space = R * object space + t

The only point we do this on is the centre of the bounding box calculated by Gen6D, good enough for our needs.

Although when testing this arrangement, we discovered few issues:

Z has very large values, e.g. about 6.20 when the object is actually about 55 centimetre away. The 'correct' value we expected would be approximately 0.55
Different basises, a quirk with the robot is that the camera uses X as the forward/depth, as contrary to Z as the forward as I have noticed here. A temporary hack I use is to switch X and Z around when publishing the pose to ROS.

We use the custom database, which seem to already rotate and scale the object, as suggested in #14 as well. The first issue with the position is a major roadblock to using Gen6D in our ROS project. Rotation is currently not a concern here, but we do welcome suggestion or tips on how to calculate it from the Gen6D predictions & data.

liuyuan-pal commented 1 year ago

Hi, the scale of the coordinate system is determined by the COLMAP first. Then, we scale the coordinate system to ensure the object_point_cloud is inside a unit sphere in the origin. In this case, the object coordinate system used in Gen6D is not a metric coordinate system and the scale won't match with the real scale in meter.

martinlyra commented 1 year ago

Understood. We devised a solution to approximate the result into the metric coordinate system by using our RGB-D camera's depth sensor.

Use the bounding box object centre and project it into the image to acquire the 2D point in the image where the centre should be.
Acquire the depth value at the object centre's image point from the depth image. Although for some cameras it may be necessary to search for all pixels with N pixels of the centre and compute the mean of all existing depth values.
Homogenize the camera space point like you would do with one after transforming it with the K; divide the point with the last component of point.
Multiply the homogenized camera space point with the depth value

We lack theoretical proof behind this solution, but from the tests we have done on this concept, it works fine to our needs from a purely engineering point of view. It should be noted that doing this in this plain method assumes that there are no obstruction in front of the object's centre.

Using a regular RGB camera (one that does not have depth sensoring), it may be necessary to "calibrate" by detecting the object, record the Z-value. Then measure the camera to the object. Use both the values to scale the result with the constant factor measured distance / detected Z-value.

Since the ~~COLMAP~~ object point cloud was in decimetre, we had to scale our result using 1 / (10 * object_scale) to get our result in metre, where object_scale was the factor used to scale the object point cloud to the unit sphere. Done after the object-space -> camera-space transformation mentioned in the original post. Unlike the RGB-D method, however, the reliability of the Z-values estimated this way could not be guaranteed.

We found that we did not need to do a basis change either. Using the R as-is but converted to a quaternion was enough to transfer the Gen6D's estimate to our ROS system - assuming identity matrix for the object.

liuyuan-pal commented 1 year ago

Your solution is reasonable to use the depth value from the RGBD sensor to rectify the scale used here!

COLMAP is not in decimetres. It is in a "pixel" scale, which relates to the focal length specified as inputs.
Rotation is not affected by the scale and we only need to scale the translation.

martinlyra commented 1 year ago

Whoops! I meant the internal scale of the object .ply models used in our work, not sure how I confused it with COLMAP. Thank you!

liuyuan-pal / Gen6D

Acquiring the 3D pose, Z too large #87