Coordinate Systems Ambiguity

zhengthomastang / 2018AICity_TeamUW

The winning method in Track 1 and Track 3 at the 2nd AI City Challenge Workshop in CVPR 2018 - Official Implementation

http://openaccess.thecvf.com/content_cvpr_2018_workshops/w3/html/Tang_Single-Camera_and_Inter-Camera_CVPR_2018_paper.html

553 stars 198 forks source link

Coordinate Systems Ambiguity #36

Open fouad1995 opened 1 year ago

fouad1995 commented 1 year ago

Hello,

I'm a little bit confused about the coordinate systems [Camera and World] and your assumptions :

originally CCS parallel with WCS
translate upwards by t
rotate yaw(pan) degrees around Y axis
rotate pitch(tilt) degrees around X axis
rotate roll degrees around Z axis

I choose X-Y to be the ground plane and Z-axis points upward [right handed rule] then I start to apply your assumption and ended up with the CCS Z-Axis is pointing upward which means that the camera is looking at the sky.

You can see the image below for the steps I did when applying your assumption. [right hand role rotation] Coordinate frames

I think that the Z-Axis should point to right [replaced with X-Axis] to be reasonable.

Could you advice if I'm working with correct approach or not ? As changing coordinate frames' axes will make [Roll , Pitch and Yaw] missed up .

Note :

X-Axis [Red]
Y-Axis [Green]
Z-Axis [Blue]

Thanks in advance.

zhengthomastang commented 1 year ago

Thank you for reaching out and providing a detailed explanation of your problem. Based on your description, it appears that there may be a misunderstanding regarding the conventions of the coordinate system.

In many computer vision systems and 3D graphics, a common convention is that the camera coordinate system (CCS) is a right-handed system where:

The X-axis points to the right of the camera The Y-axis points down The Z-axis points out of the camera, in the direction that the camera is looking This may seem counter-intuitive when compared with the world coordinate system (WCS), but it's a common practice due to the way images are represented in computer memory: the top-left pixel is typically (0,0).

Your transformation steps are correct, but you seem to be starting from an unconventional setup where the Z-axis is pointing upwards in the CCS. That could be the source of your confusion.

To align with the conventions mentioned above, you should start with the CCS where:

The X-axis points to the right The Y-axis points downwards The Z-axis points forward (towards what the camera is looking at) Once this is set up, your transformation steps should give you the correct results. However, if your application requires a different setup (for example, the camera looking upwards), then you might need to adjust the initial orientation or the order of transformations accordingly.

Please note that changing the axes of the coordinate frames doesn't inherently mess up the roll, pitch, and yaw. What it does affect is the reference direction for those rotations. Always remember that these rotations are relative to the camera's current orientation, not an absolute direction in world space.

I hope this clears up the confusion. If you have any more questions, feel free to ask.

fouad1995 commented 1 year ago

Thank you for reaching out and providing a detailed explanation of your problem. Based on your description, it appears that there may be a misunderstanding regarding the conventions of the coordinate system.

In many computer vision systems and 3D graphics, a common convention is that the camera coordinate system (CCS) is a right-handed system where:

The X-axis points to the right of the camera The Y-axis points down The Z-axis points out of the camera, in the direction that the camera is looking This may seem counter-intuitive when compared with the world coordinate system (WCS), but it's a common practice due to the way images are represented in computer memory: the top-left pixel is typically (0,0).

Your transformation steps are correct, but you seem to be starting from an unconventional setup where the Z-axis is pointing upwards in the CCS. That could be the source of your confusion.

To align with the conventions mentioned above, you should start with the CCS where:

The X-axis points to the right The Y-axis points downwards The Z-axis points forward (towards what the camera is looking at) Once this is set up, your transformation steps should give you the correct results. However, if your application requires a different setup (for example, the camera looking upwards), then you might need to adjust the initial orientation or the order of transformations accordingly.

Please note that changing the axes of the coordinate frames doesn't inherently mess up the roll, pitch, and yaw. What it does affect is the reference direction for those rotations. Always remember that these rotations are relative to the camera's current orientation, not an absolute direction in world space.

I hope this clears up the confusion. If you have any more questions, feel free to ask.

Thanks for your explanation but I have some other questions if you don't mind :

The center of WCS is the ray pass away from camera and intersects with the ground plane ?
The transformation matrix [rotation | translation] converts WCS to CCS ?
The projection matrix projects only points in the ground plane [ real 3d point ] or any point in WCS ?

Thanks in advance

zhengthomastang commented 1 year ago

Thank you for your further questions. I'll address them one by one.

The center of WCS (World Coordinate System): It's a matter of definition, really. In a typical setup, the origin of the WCS might be at the camera's location, or it could be at some arbitrary point in the environment, such as the center of the scene, or a specific object of interest. If you are referring to a ray that starts at the camera and intersects the ground plane, that would usually be the Z-axis in a camera coordinate system (CCS). This ray may not necessarily be the center of the WCS.

Transformation Matrix: A transformation matrix of the form [R|t] (where R is rotation and t is translation) is typically used to convert from one coordinate system to another. If you have a transformation matrix that converts from WCS to CCS, you would use it to express points that are in WCS in terms of the CCS. So, yes, if you have a transformation matrix set up this way, it will convert from WCS to CCS.

Projection Matrix: A projection matrix is typically used to project 3D points in the camera's view (regardless of the coordinate system they're in) onto the 2D image plane of the camera. It doesn't matter if the points are on the ground plane or elsewhere in the scene; as long as they are in the view of the camera, they can be projected onto the image plane. The projection operation itself doesn't discriminate between points based on their location in 3D space.

I hope this helps. If you have any other questions, or if something is still unclear, please don't hesitate to ask!

fouad1995 commented 1 year ago

Thank you for your further questions. I'll address them one by one.

The center of WCS (World Coordinate System): It's a matter of definition, really. In a typical setup, the origin of the WCS might be at the camera's location, or it could be at some arbitrary point in the environment, such as the center of the scene, or a specific object of interest. If you are referring to a ray that starts at the camera and intersects the ground plane, that would usually be the Z-axis in a camera coordinate system (CCS). This ray may not necessarily be the center of the WCS.

Transformation Matrix: A transformation matrix of the form [R|t] (where R is rotation and t is translation) is typically used to convert from one coordinate system to another. If you have a transformation matrix that converts from WCS to CCS, you would use it to express points that are in WCS in terms of the CCS. So, yes, if you have a transformation matrix set up this way, it will convert from WCS to CCS.

Projection Matrix: A projection matrix is typically used to project 3D points in the camera's view (regardless of the coordinate system they're in) onto the 2D image plane of the camera. It doesn't matter if the points are on the ground plane or elsewhere in the scene; as long as they are in the view of the camera, they can be projected onto the image plane. The projection operation itself doesn't discriminate between points based on their location in 3D space.

I hope this helps. If you have any other questions, or if something is still unclear, please don't hesitate to ask!

Thank you so much for your explanation.

Regarding first and third paragraphs I'm confused a little bit with the function of projection matrix As far as I know that projection matrix itself is a combination between extrinsic and intrinsic P = K [R|T] and it converts 3d points from WCS to camera image plane . so my question is :

Is the transformation Matrix [R|T] equal to an identity matrix if I assumed that the WCCS is same as CCS ?
Where is the WCCS center point based in your assumption ?

Thanks for your help and support

zhengthomastang commented 1 year ago

If you're assuming that the World Coordinate System (WCS) is the same as the Camera Coordinate System (CCS), then yes, the [R|T] transformation would be an identity transformation. In this case, there is no rotation or translation needed to align the WCS with the CCS because they're already the same.

However, it's worth noting that this is a simplification and not typically how things work in practice. In real-world applications, the camera is usually moving through the world (like in a video game or a robotics application), so the WCS and CCS are not the same. The transformation [R|T] changes over time to represent the camera's changing position and orientation.

As for the center point of the WCS, it really depends on the context and the specific application. In some cases, it could be a meaningful point in your environment, like the center of a room or a landmark. It could also be based on an arbitrary point, such as the origin of a 3D model in a game, or it could be based on the initial position of the camera at the start of an application.

If the WCS is the same as the CCS (as in your previous question), then the center point of the WCS would also be the same as the center point of the CCS, which is usually defined to be the camera's position.

I hope this answers your questions!

fouad1995 commented 1 year ago

Thank you so much for your time and support.

I have a stationary camera that looks over a road and I select the vanishing points and all works fine and the questions raised because that the transformation matrix is not identity , this means that you assumed that the CCS is not as WCS and if I have other sensor that I need to map its data to camera then I need to do the following steps :

Map the readings from other sensor to WCS first [ then the WCS needs to be known ]
Multiply this reading with projection matrix to get it on image plane Point in image = P * (transformation between other sensor to world) * point in other sensor frame

Or the second approach to do the following steps :

Map readings from other sensor coordinate system to CCS
Multiply this reading with Intrinsic matrix to get it on image plane Point in image = K * (transformation between other sensor and camera * point in other sensor frame

So WCS needs to be defined if I will use the first approach that's why I'm asking about WCS.

And one more thing regarding the reprojection ( red points on the image ) it will be in the direction of the camera right ? What I mean is that if the camera facing to the road as in the description the points will go out from the camera to the road but if the camera is angled with road side then the red points should cross the road right ?

Thank you so much for your time again