remaro-network / SubPipe-dataset

A Submarine Pipeline Inspection Dataset for Segmentation and Visual-inertial Localization
GNU General Public License v3.0
34 stars 3 forks source link

About groundtruth pose #2

Closed bachzz closed 9 months ago

bachzz commented 9 months ago

Hello, thanks for your great work !

This might be a naive question, since I'm new to visual SLAM and robotics.

I could find the ground truth (x,y,z,phi,theta, psi) in EstimatedState.csv from dataset, but I'm not sure how to convert them into pose format that is normally used, for example, pose format generated by COLMAP. I think (phi, theta, psi) are in Euler format and can be converted into quaternions, but I'm not sure about the translation part (x,y,z), and can we simply compute translation = prev_xyz - next_xyz?

I would like to confirm whether if those conversions are accurate, and be able to compare with other SLAM algos. Or if you already have an existing pose dataset of SubPipe (suitable for SLAM algos), that would be great !

Additionally, could you elaborate more on how the ground truth poses were estimated? From paper, I only found "estimated from the navigation sensors". I wonder if it can be more robust than estimation made by COLMAP.

Thank you very much.

olayasturias commented 9 months ago

Hello Bachzz, Thanks for your interest in SubPipe! :)

Euler angles and quaternions are two different representations of orientations. That is, they have nothing to do with the translation part. They are usually treated as separate vectors depending on what you're trying to implement. If that is the case, the mathematical operations you carry out on them must be treated carefully. The Lie groups SE is a more (mathematically) convenient representation that jointly considers translations and orientations. If you're new to SLAM and these concepts are not familiar to you, I have a paper that sums them up. The paper is about loss functions, but I introduce these concepts in sections II and III. I will let you the link here in case it helps :)

As you mentioned, both quaternions and translation vectors are -indeed- vectors and can be subtracted, keeping in mind that the quaternion vector must be unitary. I find the SE3 representation more convenient: in that case, the subtraction becomes a mere matrix multiplication that jointly operates on translation and orientation. For example, in the code snippet that I'm attaching below, I load all the poses from the CSV and convert them to SE3. Afterwards, I multiply all the poses by the first one, so that the initial pose is in the origin (which, in SE3, corresponds to the unitary matrix I_4x4).

from scipy.spatial.transform import Rotation as R

def load_poses_from_txt_subpipe(file_name):
    """ Load absolute camera poses from csv file (IMC format)
    Each line in the file should follow the following structure
        x y z phi theta psi 

    Args:
        file_name (str): txt file path

    Returns:
        poses (dict): dictionary of poses, each pose is a [4x4] array
    """
    poses = {}

    with open(file_name, 'r') as csvfile:
        datareader = csv.reader(csvfile)
        for i,line in enumerate(datareader):
            if i == 0: continue
            P = np.eye(4)
            x, y, z, phi, theta, psi = list(map(float, line[7:13])) 
            # euler -> Rotation matrix
            r = R.from_euler('xyz', [phi, theta, psi], degrees=False)
            P[:3, :3] = r.as_matrix() 
            P[:3, 3] = np.asarray([x, y, z])            
            poses[i-1] = P

    pose_0 = poses[list(poses.keys())[0]]
    for timestamp in poses:
        poses[timestamp] = np.linalg.inv(pose_0) @ poses[timestamp]
    return poses

Regarding the robustness of the estimation vs that of COLMAP, We haven't tested COLMAP on it, so it's hard to say. I'd bet that it would not be more robust because the imaging conditions are quite challenging. If you try it out, I would love to know about the results you get though :) The estimated state is inferred from the inertial (INS) and sonar sensors (DVL) using a probabilistic approach (Kalman). To the best of my knowledge, these sensors provide better estimates than any vision-based algorithm due to the challenging imaging conditions.

Let me know if this helped or if you have any further questions :)

bachzz commented 9 months ago

Hello @olayasturias , Thank you so much for your time writing such detailed explanation for me .. It helped me understood SLAM fundamentals better now :)

Regarding ground-truth camera poses from SubPipe CSV, I have obtained the correct poses thanks to using SE3 representation based on your code, an example of chunk3 is shown in the figure below, which is similar to the result in your paper :) image

Additionally, I also tried to reproduce tartanvo and tartanvo-finetuned (on chunk2, and on aqualoc-sequence7) result. Interestingly, finetuned-aqualoc-seq7 seems to outperform finetuned-chunk2 (or simply less overfit?), but i need to test more with different iterations.

Regarding trying out COLMAP on chunk3, you're right, I could only obtained (13 registered images, 601 points) even when setting very low peak_threshold=0.00100 to obtain more features, and more relaxed registration settings abs_pose_max_error=30, abs_pose_min_num_inliers=10.

Screenshot from 2024-02-25 11-40-11

Also, this is not related to SubPipe, but I figured out how to extract correct camera poses from COLMAP's images.txt using your recommended SE3 representation. I initially guessed that the poses in images.txt are world2cam SE3 matrices, so I computed its inverse to obtain cam2world matrices. I seem to get correct result for aqualoc-sequence7, but I'm still not really certain.

for i in poses:
    poses[i] = np.linalg.inv(poses[i])

image

Regarding the estimated state using filter on fusion of sensors, I am also very interested in this topic, yet due to my limited background on robotics, I am starting with only visual sensor, and enjoy the learning process slowly, till I can work with multiple sensors (maybe in my PhD thesis related to lifelong slam) :)

Additionally, I'm curious to run more experiments on synthetic underwater datasets like MIMIR-UW, and see how to bridge the gap performance between simulation and real world of VO system. So if you could release the dataset soon, it would be really great !!

Thank you so much for your help :)