PoseCNN: Replacing axisangle with Quaternion representation

apollospace commented 4 years ago

Hello,

I am building a custom dataset using the ZED2 camera. I would also like to replace the PoseCNN with the pose that the ZED API provides. The API provides a quaternion [X, Y, Z, W] for orientation and a vector [x, Y, Z] for translation/position wrt to the previous camera frame.

From the past discussions #17 #204 #128 and after reading the codebase, I can see that the poseCNN outputs an axisangle for rotation and a translation.

If I were to replace the PoseCNN, my questions are:

1) Will it be enough to convert from a quaternion representation to axisangle and drop it as an inplace replacement?

2) If I understand the inversion in #17 , I will still have to perform the inversion for T_1 -> T_0?

Many thanks

mrharicot commented 4 years ago

Hi, If I understand correctly you want to remove the poseCNN altogether and use the poses from the ZED camera to perform the reprojection? If so you can simply compute the proper rigid transformation matrices in the dataloader and treat them like inputs.

It should be. You need to make sure of which way the transform is given, and which quaternion representation they are using.
If you treat these transforms as inputs you simply need to compute the correct one in the dataloader, no need to invert in the trainer.

I hope this helps!

apollospace commented 4 years ago

Okay. Thanks for confirming that. If I may ask further, would this be the right way to convert from a quaternion to axisangle?: https://www.euclideanspace.com/maths/geometry/rotations/conversions/quaternionToAngle/index.htm

Also, the API uses a right handed Y down coordinate system by default. Other available systems are:

Left handed, y-up (Unity)
Right handed, y-up (OpenGL)
Left handed, z-up (Unreal Engine)
Right handed, z-up (ROS)

My intuition says that I must instead use right handed Y-up. Would this be the right coordinate system to use in place of the PoseCNN?

Many thanks

mrharicot commented 4 years ago

We use the standard computer vision coordinate system for cameras:

Right handed, x-right, y-down, z-forward.

For handling rotations I would recommend using scipy: (although I have never tried it myself) https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.transform.Rotation.html You can create a Rotation using from_quat and get the axis angle with as_rotvec

I hope this helps!

apollospace commented 4 years ago

Thank you very much for the info and recommendations. I seem to be getting the information in the correct format. Cheers!

apollospace commented 4 years ago

Hello again @mrharicot

I tried training a network using the IMU information from the ZED, but am not getting the right depth predictions.

The IMU data reports translation in metric units which are significantly larger than what the PoseCNN outputs. The color_pred outputs suffered from severe scaling (zoomed out) with heavy padding on the outer regions. I scaled the IMU information by the self.width*2 and that seems to have improved the image warping. Would this be a right approach? I wasn't sure what units the PoseCNN was operating in.
After training for ~40 epochs, the predictions are not right for a moving scene. I think I'm making an incorrect inversion of the (axisangle,translation) at transformation_fromparameters. When the ZED moves forward in Z, the IMU reports a negative value for Z translation between the current and previous frame; i.e: the current frame/position needs to be moved by -Z to be in the previous position. From the outputs of the PoseCNN it seems to be using the same format. With this knowledge, I'm inverting the T-1->T_0 pose (same as PoseCNN setting) for the IMU data. Do you think this is incorrect and that I should be inverting T_0->T_1 instead?

Thanks for your help

nianticlabs / monodepth2

PoseCNN: Replacing axisangle with Quaternion representation #209