microsoft / RoomAliveToolkit

Other
715 stars 191 forks source link

Mapping Kinect's 3D point to Projector's 2D pixel in Unity #42

Open Antzy opened 8 years ago

Antzy commented 8 years ago

Hey, I am trying to map a 3D camera space point from kinect to a projector's 2D pixel in unity. I know this has been asked here several times and I've read through all those threads but could not make it work. My calibration of 1 projector and 1 kinect is coming out pretty good and switching to projector view in CalibrateEnsemble lines up the corners pretty well. The overlayed mapping also looks good enough in the sample.

I think there are 2 ways to go about this, and I tried both. Assuming 3D point from kinect is [kx, ky, kz] which is the tracked joint position of my right hand:

1) Using the Project() function: a. First convert from homogeneous camera coordinate frame(i.e a Vector4 with values [kx, ky, kz, 1]) to projector coordinate frame. As described in this discussion: https://github.com/Kinect/RoomAliveToolkit/issues/13,

3D point x in projector coords is A * x, where A is the projector's 4x4 pose matrix and x is the (homogeneous) point in the depth camera.

So I used a manually filled Matrix4x4(from projector pose in xml) and multiplied it with [kx, ky, kz, 1] to get [px, py, pz, 1].

b. Then convert projector's 3D point to 2D point:

x can then be projected to projector image coordinates using the projector's 3x3 intrinsics matrix. Look for the 'Project' function in the code.

I then called Project with parameters of projectorCameraMatrix(manually filled from XML), a zero matrix(as projector lensDistortion is [0, 0]), px, py, pz, u and v. The value of u and v I get in pixels does not map correctly to the my hand after converting it from (-width/2, width/2) and (-height/2, height/2) to (0, width) and (0, height). There is a bit of distance between the two which changes as I move around.

2) Using GraphicsTransforms.ProjectionMatrixFromCameraMatrix: As discussed in: https://github.com/Kinect/RoomAliveToolkit/issues/18 I calculated 'projector' and 'view' matrices as calculated in CalibrateEnsemble > MainForm.cs > SetViewProjectionFromProjector function. Then calculated projectorWorldViewProjection: projectorWorldViewProjection = world * form.view * form.projection; The 'world' is an identity matrix for 1 kinect, so removed it in code. The projectorWorldViewProjection I get is the exact same as calculated by ProjectionMappingSample. Then multiplying projectorWorldViewProjection with [kx, ky, kz, 1], I get [qx, qy, qz, qw]. Mapping this from (-1, 1) to (0, 1) for x and y, and reversing y's axis, I still get quite a bit of displacement between my hand and the point projected.

There are a few things which I have doubts about and I might be doing something wrong there:

  1. Do I need to convert the right hand matrices used by RoomAlive to left hand used by unity? I'd need the conversion if I'm placing 3D objects in the scene but converting kinect camera space 3D point to projector's 2D point only requires matrix multiplications and those should be platform independent, right?
  2. In ProjectorCameraEnsemble, the values used for kinect points[kx, ky, kz, 1] before being used for matrix multiplication are modified:
    // depth camera coords
    var depth = depthImage[x, y] / 1000f; // m
    // convert to depth camera space
    var point = depthFrameToCameraSpaceTable[Kinect2Calibration.depthImageWidth * y + x];
    depthCamera[0] = point.X * depth;
    depthCamera[1] = point.Y * depth;
    depthCamera[2] = depth;
    depthCamera[3] = 1;

Do I need to also divide kz by 1000 and multiply kx and ky with kz/1000 before multiplication with matrix before using with Project() or GraphicsTransforms.ProjectionMatrixFromCameraMatrix? I tried it out and that seemed to make the mapping even worse.

I've been trying out everything and anything I can but have come to a dead end. I know it's a long rant but would be grateful for any help...

thundercarrot commented 8 years ago

Both approaches should work. Looking over your notes, a few things come to mind: Regarding the use of Project: you should not have to add (-width/2, width/2) and (-height/2, height/2) to the output. The principal point calculation in Project already does this. The output of Project is image coordinates, but where x points left and y points up, so you may need to flip the axes depending on what you want to do next. Regarding the second approach, be sure to divide by w after multiplying by the projection matrix.

Regarding the "few things" 1: you are correct that you should not need to worry about this here. 2: the conversion you cite there is to convert a value in the depth image to a world coordinate point. The points reported by the skeleton tracker are already in world coordinate points; you don't need this conversion.

Antzy commented 8 years ago

Thanks for the quick reply! You are right, I was messing up the output range correction for Project function. The output already comes in the correct range i.e. (0, width) and (0, height). I just had to flip y axis. I got it to work quite well, though there is still some distance between real and projected(might be due to imperfect calibration). There was another problem- I was overlooking this code(because it looks kinda scary, but isn't):

// T_WPk is inverse of T_PkW, projector pose
var T_WPk = new Matrix(4, 4);
T_WPk.Inverse(projector.pose);

foreach (var camera in projector.calibrationPointSets.Keys)
{
    var cameraPoints = projector.calibrationPointSets[camera].worldPointInliers;
    var projectorPoints = projector.calibrationPointSets[camera].imagePointInliers;

    // transforms camera to projector coordinates
    var T_CjW = camera.pose;
    var T_CjPk = new Matrix(4, 4);
    T_CjPk.Mult(T_WPk, T_CjW);

    var cameraInProjector4 = new Matrix(4, 1);
    cameraInProjector4[3] = 1;

    var cameraPoint4 = new Matrix(4, 1);
    cameraPoint4[3] = 1;

    for (int i = 0; i < cameraPoints.Count; i++)
    {
        var cameraPoint = cameraPoints[i];

        cameraPoint4[0] = cameraPoint[0];
        cameraPoint4[1] = cameraPoint[1];
        cameraPoint4[2] = cameraPoint[2];

        cameraInProjector4.Mult(T_CjPk, cameraPoint4);

        cameraInProjector4.Scale(1.0 / cameraInProjector4[3]);

        // fvec_i = y_i - p_k( T_CjPk x_i );
        double u, v;
        CameraMath.Project(projector.cameraMatrix, projector.lensDistortion, cameraInProjector4[0], cameraInProjector4[1], cameraInProjector4[2], out u, out v);

        var projectorPoint = projectorPoints[i];
        fvec[fveci++] = projectorPoint.X - u;
        fvec[fveci++] = projectorPoint.Y - v;
    }
}

So basically, for a single camera and projector, before calling project, we're multiplying kinect's point [kx, ky, kz, 1] with the inverse of projector's pose, diving it by its w, and sending that to the Project function. That fixed it. :)

For GraphicsTransforms.ProjectionMatrixFromCameraMatrix, I'll try division by w after multiplication with projectorWorldViewProjection matrix. That might be a reason I was getting scaled values. Will try and let you know. Will also post my code for Unity for others in the future won't have to do all that digging... Thanks for all the help!

Antzy commented 8 years ago

I couldn't get either of the methods to map 100% correctly. Only something like 80% there. It might be because Unity's Vector4 and Matrix4x4 uses float and so the very precise double values calculated by the RoomAlive toolkit are degraded to a float value. Should be good enough, causing only a difference of 2-3 pixels instead of 50-100 or more at times I'm getting, but I might be wrong.

So I'm switching over to Visual Studio project. Don't know, nor need, a lot of the complex things implemented, so will be using a simplified version of the toolkit with WinForms. If anyone needs the simplified code for single kinect-projector pair mapping in Unity, I've attached what I've achieved till now.

Thanks guys for the awesome toolkit and for releasing the source code...

KinectToProjectorMapping.zip

thundercarrot commented 8 years ago

I don't think your problem is due to floating point precision. I'd almost bet my life on that.

Do you see more accurate results when the 3D point you give happens to lie on or near the (calibrated) projection surface?

Antzy commented 8 years ago

I see more accuracy near the background surface used for calibration. Also, better near approximately the center of projection. As I move the point towards edges, it seems to be sliding off the actual position, as if there was a slight scaling.

I've been making the same thing using the RoomAliveToolkit. Will compare the output from Unity and .NET of the same code(except for difference in matrix classes used) and see what might be the problem. I feel it's possible it might be something to do with resolution.