A question about getting the 3D coordinates in world space coordinates system according to a pv image point

Hello guys! I want to get the 3D coordinates of a real world object with the HoloLens2 cameras, but disappointed, the result is far from the true value. And my method is:

get the (u,v) coordinates of the object center in PV Camera image space through object detection
transform the (u,v) to (U,V,W,1) in Unity world coordinate system

(1) transform the (u,v) to (X,Y,Z) which Z=1 through PV Camera UnprojectAtUnitDepth() method (2) transform the (X,Y,-Z,1) to homogeneous coordinates (U,V,W,1) in Unity world coordinate system by multiplying the extrinsics, and the extrinsics got from pv_frame.CoordinateSystem().TryGetTransformTo(m_worldCoordSystem) method , and the _mworldCoordSystem is set to Unity Coordinate System
transform the (U,V,W,1) to (u,v) in Depth Camera image coordinates and the Depth Camera runs on the LongThrow Mode

(1) transform the (U,V,W,1) to (X,Y,Z) in Depth Camera space by multiplying the Inv(extrinsics_depth) , extrinsics_depth refers to Unity Coordinate System (2) transform the ( X/Z ,Y/Z, Z/Z)_depth (divide by Z to make Z = 1) to (u,v)_depth though MapCameraSpaceToImagePoint() method
Until now, I have made the (u,v)_pv to (u,v)_depth, next just calculate the 3D coordinates according the (u,v)_depth

(1) get depth D of this uv point , and get the camera space coordinates through MapImagePointToCameraUnitPlane(uv_depth, xy) (2) set an intermediate variable, tempPoint = D /1000 * 1/sqrt(x^2+y^2+1) * ( x , y , z =1) (3) get the final result by pointInWorld = tempPoint * extrinsics_depth , and X = pointInWorld.X, Y = pointInWorld.Y , Z = - pointInWorld.Z

Can anyone tell what is wrong with above actions? Or my code has error? Any help would be appreciated! My code referenced Samples/StreamRecorder/StreamRecoderApp/VideoFrameProcessor.cpp and petergu684's repo https://github.com/petergu684/HoloLens2-ResearchMode-Unity/ , thanks for the work! And I will show my code in the comment area

my code below is a little miscellaneous, thank you very much if you have the patience to read it the "uv_pv to (U,V,W,1) in Unity world coordinate system" method:

float* VideoFrameProcessor::PixelPointToWorldPoint(float(&uv_pv)[2])
{
    // **frameMap is a Concurrent_unordered_map<int,frame's type> which stores the pv camera frame**
    if (frameMap.size() <= 0) 
    {
        float* empty = new float[4];
        empty[0] = 0.0f;
        empty[1] = 0.0f;
        empty[2] = 0.0f;
        empty[3] = 0.0f;
        return empty;
    }

    // 1.Get current pv frame
    auto m_latestFrame = frameMap.at(frameMap.size());
    // **I store the frame with key 1->2->3->.... , so frameMap.size() means the new received frame**

    // 2.Convert input "uv_pv" to "XY_pv" in camera coordinate system which z = 1  
    winrt::Windows::Foundation::Point XY_pv;
    winrt::Windows::Foundation::Point uv_pv_point;  uv_pv_point.X = uv_pv[0];  uv_pv_point.Y = uv_pv[1];
    XY_pv = m_latestFrame.VideoMediaFrame().CameraIntrinsics().UnprojectAtUnitDepth(uv_pv_point); // Z = 1
    auto PVToWorld = m_latestFrame.CoordinateSystem().TryGetTransformTo(m_worldCoordSystem).Value();
    // **m_worldCoordSystem is set to Unity World Coordinate System**

    XMMATRIX PVtoWorld_Transform; 
    PVtoWorld_Transform.r[0] = XMVectorSet(PVToWorld.m11, PVToWorld.m12, PVToWorld.m13, PVToWorld.m14);
    PVtoWorld_Transform.r[1] = XMVectorSet(PVToWorld.m21, PVToWorld.m22, PVToWorld.m23, PVToWorld.m24);
    PVtoWorld_Transform.r[2] = XMVectorSet(PVToWorld.m31, PVToWorld.m32, PVToWorld.m33, PVToWorld.m34);
    PVtoWorld_Transform.r[3] = XMVectorSet(PVToWorld.m41, PVToWorld.m42, PVToWorld.m43, PVToWorld.m44);

    // 3.Get the Point in Ref World
    XMVECTOR CameraCoordVector = XMVectorSet(XY_pv.X, XY_pv.Y, -1, 1);  // the camera looks down at -z, be sure that the unit plane's z mutiplied by a -1 factor
    auto UVW_World = XMVector4Transform(CameraCoordVector, PVtoWorld_Transform); // XMVector4Transform(vector,matrix) = transpose(matrix)*vector

    float* PointWorldCoord = new float[4];
    PointWorldCoord[0] = XMVectorGetX(UVW_World);
    PointWorldCoord[1] = XMVectorGetY(UVW_World);
    PointWorldCoord[2] = XMVectorGetZ(UVW_World);
    PointWorldCoord[3] = XMVectorGetW(UVW_World);

    return PointWorldCoord;
}

and the method that input (U,V,W,1) and get the final real postion XYZ is:

input: pHL2ResearchMode UVWOne_pv[4] output: XYZ[3]
void HL2ResearchMode::WorldPointTransform_PVtoDepth(HL2ResearchMode* pHL2ResearchMode, float(&UVWOne_pv)[4] , float(&XYZ)[3])
    {
        // ----- 1. GET BUFFER -----

        // ----- 2. GET (U,V) IN DEPTH IMAGE COORDINATES -----
        //get CameraToRefWorld Transform matrix
        Windows::Perception::Spatial::SpatialLocation transToWorld = nullptr;
        auto ts = PerceptionTimestampHelper::FromSystemRelativeTargetTime(HundredsOfNanoseconds(checkAndConvertUnsigned(timestamp.HostTicks)));
        transToWorld = pHL2ResearchMode->m_locator.TryLocateAtTimestamp(ts, pHL2ResearchMode->m_refFrame);
        // **m_refFrame is set to the Unity World Coordinate System**
        XMMATRIX depthToWorld = XMMatrixIdentity();
        depthToWorld = pHL2ResearchMode->m_longDepthCameraPoseInvMatrix * SpatialLocationToDxMatrix(transToWorld); // inv(RigToCamera) * RigToWorld = CameraToWorld

        //get uv for pixel's depth
        float uv_depth[2];
        XMMATRIX WorldToDepthCamera;
        WorldToDepthCamera = XMMatrixInverse(nullptr, depthToWorld);
        XMVECTOR UVW_World_vtr = XMVectorSet(UVWOne_pv[0], UVWOne_pv[1], UVWOne_pv[2], UVWOne_pv[3]);
        auto XYZ_depth = XMVector4Transform(UVW_World_vtr, WorldToDepthCamera);
        float XYZ_depth_value[4] = { XMVectorGetX(XYZ_depth), XMVectorGetY(XYZ_depth), XMVectorGetZ(XYZ_depth), XMVectorGetW(XYZ_depth) };

        float xy_depth_norm[2] = { XYZ_depth_value[0] / XYZ_depth_value[2], XYZ_depth_value[1] / XYZ_depth_value[2] }; // to plane z = 1

        // value validity check  
        if (( xy_depth_norm[0] > 1 || xy_depth_norm[0] < -1 ) || (xy_depth_norm[1] > 1 || xy_depth_norm[1] < -1) ) // means the value are out of range [-1,1] , also means invalid values in the depth buffer
        {
            XYZ[0] = 0;  XYZ[1] = 0;
            XYZ[2] = 0;
            return;
        }

        pHL2ResearchMode->m_pLongDepthCameraSensor->MapCameraSpaceToImagePoint(xy_depth_norm, uv_depth);
        // round the uv coordinates to integer (with type float )
        uv_depth[0] = round(uv_depth[0]);
        uv_depth[1] = round(uv_depth[1]);

        // ----- 3. GET THE DEPTH AND THE COORDINATES IN WORLD SPACE -----
        //coordinate transformation
        auto idx = int(uv_depth[0]) + int(uv_depth[1]) * resolution.Width;
        UINT16 depth = pDepth[idx]; // **pDepth is the depth buffer**
        depth = (pSigma[idx] & 0x80) ? 0 : depth - pHL2ResearchMode->m_depthOffset;

        float xy[2] = { 0, 0 };
        pHL2ResearchMode->m_pLongDepthCameraSensor->MapImagePointToCameraUnitPlane(uv_depth, xy);

        auto pointOnUnitPlane = XMFLOAT3(xy[0], xy[1], 1);
        auto tempPoint = depth / 1000 * XMVector3Normalize(XMLoadFloat3(&pointOnUnitPlane));

        //get the target point coordinate
        auto pointInWorld = XMVector3Transform(tempPoint, depthToWorld);
        XYZ[0] = XMVectorGetX(pointInWorld);
        XYZ[1] = XMVectorGetY(pointInWorld);
        XYZ[2] = -XMVectorGetZ(pointInWorld);

        // ----- 4. Release and Close -----

    }

microsoft / HoloLens2ForCV

A question about getting the 3D coordinates in world space coordinates system according to a pv image point #164