Image space coordinates vs world space coordinates

an-kumar commented 7 years ago

I have been exploring the codebase & the documentation and was wondering about the coordinate systems used.

The image space used is the standard one, with (0,0) at the top-left, with Y increasing going down and X increasing going right. The world space coordinate system used, I think, is also a standard one, with Y increasing going UP, X increasing going right, and Z increasing going forward.

However, doesn't this make an inconsistency? Since in world space, Y increases going up, while in image space, Y increases going down. I checked some of the camera intrinsic models, it doesn't see like this inconsistency is addressed.

So, perhaps the world coordinate space used in Theia also has Y increasing going down, rather than going up? Or is this inconsistency addressed somewhere else?

sweeneychris commented 7 years ago

This coordinate system transformation follows directly from the camera intrinsics equations (see the various Camera class implementations), and is fairly standard for all 3d reconstruction pipelines.

On Fri, Jul 28, 2017, 6:50 PM an-kumar notifications@github.com wrote:

I have been exploring the codebase & the documentation and was wondering about the coordinate systems used.

The image space used is the standard one, with (0,0) at the top-left, with Y increasing going down and X increasing going right. The world space coordinate system used, I think, is also a standard one, with Y increasing going UP, X increasing going right, and Z increasing going forward.

However, doesn't this make an inconsistency? Since in world space, Y increases going up, while in image space, Y increases going down. I checked some of the camera intrinsic models, it doesn't see like this inconsistency is addressed.

So, perhaps the world coordinate space used in Theia also has Y increasing going down, rather than going up? Or is this inconsistency addressed somewhere else?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sweeneychris/TheiaSfM/issues/178, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwythiWmqbCaRXitBKMpbjNTqlMNiqfks5sSmWogaJpZM4OnJ67 .

an-kumar commented 7 years ago

So here, for example, in PinholeCameraModel:

  const T normalized_pixel[2] = { point[0] / depth,
                                  point[1] / depth };

  // Apply radial distortion.
  T distorted_pixel[2];
  PinholeCameraModel::DistortPoint(intrinsic_parameters,
                                   normalized_pixel,
                                   distorted_pixel);

  // Apply calibration parameters to transform normalized units into pixels.
  const T& focal_length =
      intrinsic_parameters[PinholeCameraModel::FOCAL_LENGTH];
  const T& skew = intrinsic_parameters[PinholeCameraModel::SKEW];
  const T& aspect_ratio =
      intrinsic_parameters[PinholeCameraModel::ASPECT_RATIO];
  const T& principal_point_x =
      intrinsic_parameters[PinholeCameraModel::PRINCIPAL_POINT_X];
  const T& principal_point_y =
      intrinsic_parameters[PinholeCameraModel::PRINCIPAL_POINT_Y];

  pixel[0] = focal_length * distorted_pixel[0] + skew * distorted_pixel[1] +
             principal_point_x;
  pixel[1] = focal_length * aspect_ratio * distorted_pixel[1] +
             principal_point_y;

As point[1] increases, so does distorted_pixel[1], and therefore so does pixel[1] -- does this mean that I am correct that world-space Y increases going down rather than up?

Thanks for the help.

sweeneychris commented 7 years ago

Sorry I read this in a rush before.

There are three coordinate system to keep in mind: world coordinate system, camera coordinate system, and image coordinate system. 3d points are on the world coordinate system, camera poses define the world to camera coordinate system with a rotation and translation, and the camera intrinsics define the camera coordinate system to image coordinate system transformation. I believe this is all documented in the camera classes.

World space is really any arbitrary right handed coordinate system -- the camera poses and intrinsics define the world -> camera -> intrinsics transformation. Make sense?

On Fri, Jul 28, 2017, 7:11 PM Chris Sweeney sweeney.chris.m@gmail.com wrote:

This coordinate system transformation follows directly from the camera intrinsics equations (see the various Camera class implementations), and is fairly standard for all 3d reconstruction pipelines.

On Fri, Jul 28, 2017, 6:50 PM an-kumar notifications@github.com wrote:

I have been exploring the codebase & the documentation and was wondering about the coordinate systems used.

The image space used is the standard one, with (0,0) at the top-left, with Y increasing going down and X increasing going right. The world space coordinate system used, I think, is also a standard one, with Y increasing going UP, X increasing going right, and Z increasing going forward.

However, doesn't this make an inconsistency? Since in world space, Y increases going up, while in image space, Y increases going down. I checked some of the camera intrinsic models, it doesn't see like this inconsistency is addressed.

So, perhaps the world coordinate space used in Theia also has Y increasing going down, rather than going up? Or is this inconsistency addressed somewhere else?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sweeneychris/TheiaSfM/issues/178, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwythiWmqbCaRXitBKMpbjNTqlMNiqfks5sSmWogaJpZM4OnJ67 .

an-kumar commented 7 years ago

Yeah, I understand the three coordinate systems. I don't think I worded my original question that well.

I'm not asking about how the code keeps the coordinate systems consistent -- I get that. Really what I am asking is just sort of a "sanity check" -- that is, say you have two features in an image, feature A at (pixels, so image coordinate system) (10,10) and feature B at (pixels) (10,100). In sort of "human terms", we would say that A is above B, even though the Y coordinate of B is higher than the Y coordinate of A. That's because in image space Y coordinates increase as you go down rather than as you go up.

The sanity check is just that, the current code makes it seem that once you transform the features to camera space (not world space), feature A's Y coordinate will be lower than feature B's Y coordinate (same as in image space). This is just slightly unintuitive in "human terms", since people (at least me) usually think of the Y dimension as increasing in the up direction. I'm just double checking with you that the camera space has Y also increasing going down, rather than up.

This is relevant when thinking about incorporating other pieces of information into the pipeline. For example, GPS provides lat,lon,altitude measurements. To convert that to x,y,z, we need to know if higher altitude means higher Y, or lower Y. My current understanding of the code would imply that higher altitude actually means lower Y, not higher Y. Does that track with your understanding?

Thanks again.

sweeneychris commented 7 years ago

I see what you're asking, but I think you may be thinking about this in too rigid of terms. Any right handed coordinate system may be transformed into another right handed coordinate system with a similarity transformation. So, as long as your world coordinate system is define in some right-handed coordinate system then there exists an appropriate world -> camera -> image coordinate system transform. Both the image coordinate system you reference and your definition of the world coordinate system are right handed. So while it may not be "human readible" to think of y as down, there is nothing restricting the camera poses from applying a rotation that make y point upward in the traditional sense.

an-kumar commented 7 years ago

Ah, yeah, I understand. That makes sense.

By the way, where in the code is the right-handed coordinate system enforced? In other words, where in the code does it assume that all coordinate systems are right-handed? My understanding is that the various solvers (e.g relative pose, absolute pose) do not natively assume any handedness as they are just solving numerical problems. Is the handedness only defined by the fact that in image space, we consider Y down, X right and Z forward (which defines the coordinate system as right handed)? In other words, if we replaced all the image computation with images that had their origin at the bottom-left rather than top-left, would everything else just fall into place and work (now with left handed coordinate systems)?

Thanks for the explanations.

sweeneychris commented 7 years ago

There may be other assumptions I'm forgetting about, but the strongest one is the way that rotations are formed assume right-handed coordinate systems (by construction).

In other words, if we replaced all the image computation with images that had their origin at the bottom-left rather than top-left, would everything else just fall into place and work (now with left handed coordinate systems)?

This would only work if you ensured that the viewing direction points down the negative z-axis (assuring that the coordinate system is right-handed). You'd be better served to come up with cameras intrinsics formulas that map the camera coordinate system to your desired setup. It does not have to be a left handed coordinate system to have the origin at the bottom-left of the image.

sweeneychris / TheiaSfM

Image space coordinates vs world space coordinates #178