haomo-ai / MotionSeg3D

[IROS 2022] Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving Object Segmentation
https://npucvr.github.io/MotionSeg3D/
GNU General Public License v3.0
240 stars 21 forks source link

Transform from history frame to current frame #22

Closed Ianpengg closed 1 year ago

Ianpengg commented 1 year ago

Hello , I'm trying to apply your great work on my custom dataset , there's question about the transformation from history frame to current frame.

  1. I want to know what is the actually physical meaning of this new_poses list does it store the transformation between two consecutive frames ?( local) ex:
    new_poses[0] = transformation matrix between frame0 -> frame1 new_poses[1] = transformation matrix between frame1 -> frame2

or does it store the transformation between first frame to i-th frame (global) ex: new_poses[0] = transformation matrix between frame0 -> frame1 new_poses[1] = transformation matrix between frame0 -> frame2

new_poses = []
for pose in poses:
    new_poses.append(T_velo_cam.dot(inv_frame0).dot(pose).dot(T_cam_velo))
poses = np.array(new_poses)

Could you help me clarify this? Many thanks!

MaxChanger commented 1 year ago

Hi, this is consistent with https://github.com/PRBonn/LiDAR-MOS/blob/main/utils/gen_residual_images.py#L62-L66 . And we follow the data processing and production process.

First of all, the GT pose of KITTI Odometry is defined on camera00 (gray camera), and what we need is the pose of LiDAR. [That's why use T_velo_cam and T_cam_velo]

Secondly, we hope that the pose rotation matrix of the first frame is a identity matrix, and the translation is 0, that is, the first frame of each sequence is regarded as the world system. [That's why use inv_frame0]

Ianpengg commented 1 year ago

Thanks for your explanation , I'll try it .

MaxChanger commented 1 year ago

By the way, as long as it is consistent with KITTI Odometry's pose and intrinsic/extrinsic format, I think LiDAR-MOS/issues/52 should be helpful. Good luck~

Ianpengg commented 1 year ago

Thanks your suggestion ! After reviewing the issue, assume both VIO results and GPS records with RTK enabled can provide ego pose information,
I'm uncertain if my understanding is correct, but it seems that I can utilize the provided transform information for each frame to calculate the trajectory of the LiDAR. and use the first frame as the world system By doing this, I 'm able to generate a "poses.txt" file similar to the one provided in the KITTI dataset. Do you think this way is correct?

MaxChanger commented 1 year ago

I think it is feasible, as long as T_world_to_cam or T_cam_to_world is consistent with KITTI odom (sorry I can’t remember which one it is)

Ianpengg commented 1 year ago

Hello @MaxChanger! I just want to report some progress ~

Currently, I'm facing instability issues when generating the residual image using GPS/INS data for my poses.txt file. To address this, I'm considering leveraging an open-source Lidar Odometry module to obtain more reliable pose information for all frames. Since the residual image relies on frames up to 8 frames prior to the current frame, I believe that the global drift commonly associated with Lidar odometry methods will have minimal impact on the results. Could you kindly suggest any suitable open-source repositories that can generate pose information from raw Lidar input? I'm specifically using the Velodyne HDL32E sensor from the Oxford Radar Robotcar Dataset.

And I would like to understand the level of accuracy needed for the pose information in order to effectively apply the MotionSeg3D method. Many thanks

MaxChanger commented 1 year ago

I think you must know LiDAR-MOS, our data is consistent with it. In LiDAR-MOS, an ablation experiment was performed on the noise of Pose. Obviously, the more accurate the pose, the better the effect. In addition, 4DMOS is more sensitive to Pose. They did not use the same Pose as Ours and LiDAR-MOS, but chose a solution without BA (you can check it carefully). In my impression, it is recommended to use SuMA or SuMA++ in LiDAR-MOS and 4DMOS, you can try it.

By the way, doesn't Oxford Radar Robotcar Dataset provide a ground truth pose, you can use this to experiment first?

Ianpengg commented 1 year ago

Yes, I have checked the LiDAR-MOS and 4DMOS before. However, in the Oxford Radar Robotcar Dataset, accurate ground truth poses are only provided for radar odometry. Unfortunately, for LiDAR, only GPS/INS poses and visual odometry are available. Despite trying these alternatives, the results still not good. Therefore, I am keep finding the solution to obtain accurate ground truth poses for LiDAR data. I would keep updating my progress here~ Thanks for your help !!! It's really help me a lot.

MaxChanger commented 1 year ago

If you think Radar's trajectory is accurate enough, are there any extrinsic for Radar and LiDAR that can be converted based on this?

Ianpengg commented 1 year ago

Hello Max, I've successfully executed the inference code using the Oxford dataset!

I managed to obtain the poses.txt file by utilizing the 'Direct LiDAR Odometry: Fast Localization with Dense Point Clouds' method from this GitHub repository: https://github.com/vectr-ucla/direct_lidar_odometry.

However, I've noticed that the results are not as satisfactory as you previously mentioned. It seems that I may need to manually label some data and fine-tune the model to improve the performance. I'm wondering if the bad prediction results could be attributed to improper parameter settings during the data pre-processing phase. The Oxford dataset involves two Velodyne HDL 32E LiDARs, I've only used one of them for testing purposes, The KITTI dataset use Velodyne HDL 64E LiDAR, I'm not sure the sensor mounting height is related to the range image parameters, if so, can you share how to adjust those parameters during the preprocessing stage ? Many thanks/

# range image parameters
range_image:
  height: 64
  width: 2048
  fov_up: 3.0
  fov_down: -25.0
  max_range: 50.0
  min_range: 2.0

this is one scan of test results: image

Besides I found that the front of the ego car face to y-axis I think it's different from the KITTI, would this be a problem?

MaxChanger commented 1 year ago

Sorry, I missed the message reminder until now.

The height and width is the size of Range Image, if your LiDAR is 32-lines, height = 32 is better. The fov_up/down is the pitch angle of the sensor, which is related to the sensor, you need to consult the LiDAR documentation. min_range/max_range: should be used to set the nearest / farthest distance and filter points

It is best to keep the coordinate axis of the point cloud consistent with KITTI, or you may have to modify the function of the range projection.

Ianpengg commented 1 year ago

OK,I would try it! You are extremely kind and always ready to help me. Many thanks~

Ianpengg commented 1 year ago

Besides, If I modify the field of view (FOV) and the height, would I still be able to utilize the pre-trained model for inference?

MaxChanger commented 1 year ago

Besides, If I modify the field of view (FOV) and the height, would I still be able to utilize the pre-trained model for inference?

Hello, we have tested that it should be work, but the result may not be very good. The accuracy of the model's inference results on different LiDAR and different line numbers still needs to be improved