marek-simonik / record3d_offline_unity_demo

iPhone/iPad -> Unity VFX Graph demo (pre-recorded 3D Video playback)
https://record3d.app
33 stars 9 forks source link

Getting the world-space camera poses from an .r3d file? #3

Open vl4dimir opened 1 year ago

vl4dimir commented 1 year ago

Hi @marek-simonik thanks for this great little app! Easy purchase on my end. πŸ˜„

Is there a way to get a camera pose per-frame when replaying .r3d videos? I see that when I replay the video inside of the Record3D app, the video moves in world space just as I was while recording, but in Unity it remains fixed. I'm looking through Record3DVideo.cs and there doesn't seem to be a way to access pose data. How would you suggest I do that?

You mentioned here that exporting to EXR+JPG creates the metadata.json file which has the poses. Are poses embedded in the .r3d file as well?

marek-simonik commented 1 year ago

Hi @vl4dimir, thanks for your support :)!

You can definitely get access to per-frame LiDAR camera poses when replaying .r3d videos. The metadata.json file exists inside .r3d files under the name metadata. Please read https://github.com/marek-simonik/record3d/issues/27 and https://github.com/marek-simonik/record3d/issues/33 to learn how camera poses are stored in the metadata file.

As you pointed out, the Unity demo currently does not use the camera pose data, but you could probably implement this functionality be reading the camera pose data from the metadata JSON file in this section of Record3DVideo. I think the Record3DMetadata struct would need to be changed to include the array of camera poses from the metadata file.

Once loaded, the camera poses can be used in other part of the pipeline to update the pose of the VFX graph object.

vl4dimir commented 1 year ago

@marek-simonik Awesome, thanks for the quick reply! Feel free to close the issue, I opened an issue instead of emailing you so that others might find an answer here if they get stuck. Cheers!

PockPocket commented 1 year ago

r this great little app! Easy purchase on my end. πŸ˜„

Is there a way to get a camera pose per-frame when replaying .r3d videos? I see that when I replay the video inside of the Record3D app, the video moves in world space just as I was while recording, but in Unity it remains fixed. I'm looking through Record3DVideo.cs and there doesn't seem to be a way to access pose data. How would you suggest I do that?

You mentioned here that exporting to EXR+JPG creates the metadata.json file which has the poses. Are poses embedded in the .r3d file as well?

Hey, I'm quite interested in this camera feature implementation in Unity!

@vl4dimir have you got any luck doing so ?

Cheers, Mathieu

vl4dimir commented 1 year ago

@PockPocket I haven't implemented it yet so I have no code to show, but I did confirm that it's possible to read that data from the metadata file, as Marek described.

marek-simonik commented 1 year ago

@PockPocket @vl4dimir

I implemented the feature for you; replace the contents of Record3DVideo.cs by the code listed below.

You'll need to install the Newtonsoft JSON package (Window > Package Manager > + > Install package by name… > com.unity.nuget.newtonsoft-json).

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Record3D;
using System;
using System.IO;
using Unity.Collections;
using UnityEngine.VFX;
using System.IO.Compression;
using System.Linq;
using System.Runtime.InteropServices;
using System.Timers;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

public class Record3DVideo
{
    private int numFrames_;
    public int numFrames { get { return numFrames_; } }

    private int fps_;
    public int fps { get { return fps_; } }

    private int width_;
    public int width { get { return width_; } }

    private int height_;
    public int height { get { return height_; } }

    /// <summary>
    /// The intrinsic matrix coefficients.
    /// </summary>
    private float fx_, fy_, tx_, ty_;

    private List<R3DPose> framePoses_ = new List<R3DPose>(); 

    private ZipArchive underlyingZip_;

    public byte[] rgbBuffer;

    public float[] positionsBuffer;

    public struct R3DPose
    {
        public Quaternion rotation;
        public Vector3 position;

        public Matrix4x4 matrix => Matrix4x4.TRS(position, rotation, Vector3.one);

        public R3DPose(float[] quaternionAndPosition)
        {
            if (quaternionAndPosition.Length != 7)
            {
                throw new Exception("Error while parsing the quaternion array.");
            }

            rotation = new Quaternion(
                quaternionAndPosition[0],
                quaternionAndPosition[1],
                quaternionAndPosition[2],
                quaternionAndPosition[3]);

            position = new Vector3(quaternionAndPosition[4], quaternionAndPosition[5], quaternionAndPosition[6]);
        }
    }

    [System.Serializable]
    public struct Record3DMetadata
    {
        public int w;
        public int h;
        public List<float> K;
        public int fps;
        public float[][] poses;
    }

#if UNITY_STANDALONE_OSX || UNITY_EDITOR_OSX
    private const string LIBRARY_NAME = "librecord3d_unity_playback.dylib";

#elif UNITY_STANDALONE_WIN || UNITY_EDITOR_WIN
        private const string LIBRARY_NAME = "record3d_unity_playback.dll";

#else
#error "Unsupported platform!"
#endif

    [DllImport(LIBRARY_NAME)]
    private static extern void DecompressFrame(byte[] jpgBytes, UInt32 jpgBytesSize, byte[] lzfseDepthBytes, UInt32 lzfseBytesSize, byte[] rgbBuffer, float[] poseBuffer, Int32 width, Int32 height, float fx, float fy, float tx, float ty);

    public Record3DVideo(string filepath)
    {
        underlyingZip_ = ZipFile.Open(filepath, ZipArchiveMode.Read);

        // Load metadata (FPS, the intrinsic matrix, dimensions)
        using (var metadataStream = new StreamReader(underlyingZip_.GetEntry("metadata").Open()))
        {
            string jsonContents = metadataStream.ReadToEnd();
            var parsedMetadata = JsonConvert.DeserializeObject<Record3DMetadata>(jsonContents);

            // Initialize properties
            this.fps_ = parsedMetadata.fps;
            this.width_ = parsedMetadata.w;
            this.height_ = parsedMetadata.h;

            // Init the intrinsic matrix coeffs
            this.fx_ = parsedMetadata.K[0];
            this.fy_ = parsedMetadata.K[4];
            this.tx_ = parsedMetadata.K[6];
            this.ty_ = parsedMetadata.K[7];

            // Parse
            foreach (var r3dRawPoseArray in parsedMetadata.poses)
            {
                framePoses_.Add(new R3DPose(r3dRawPoseArray));
            }
        }

        this.numFrames_ = underlyingZip_.Entries.Count(x => x.FullName.Contains(".depth"));

        rgbBuffer = new byte[width * height * 3];
        positionsBuffer = new float[width * height * 4];
    }

    public void LoadFrameData(int frameIdx)
    {
        if (frameIdx >= numFrames_)
        {
            return;
        }

        // Decompress the LZFSE depth data into a byte buffer
        byte[] lzfseDepthBuffer;
        using (var lzfseDepthStream = underlyingZip_.GetEntry(String.Format("rgbd/{0}.depth", frameIdx)).Open())
        {
            using (var memoryStream = new MemoryStream())
            {
                lzfseDepthStream.CopyTo(memoryStream);
                lzfseDepthBuffer = memoryStream.ToArray();
            }
        }

        // Decompress the JPG image into a byte buffer
        byte[] jpgBuffer;
        using (var jpgStream = underlyingZip_.GetEntry(String.Format("rgbd/{0}.jpg", frameIdx)).Open())
        {
            using (var memoryStream = new MemoryStream())
            {
                jpgStream.CopyTo(memoryStream);
                jpgBuffer = memoryStream.ToArray();
            }
        }

        // Decompress the LZFSE depth map archive, create point cloud and load the JPEG image
        DecompressFrame(jpgBuffer,
            (uint)jpgBuffer.Length,
            lzfseDepthBuffer,
            (uint)lzfseDepthBuffer.Length,
            this.rgbBuffer,
            this.positionsBuffer,
            this.width_, this.height_,
            this.fx_, this.fy_, this.tx_, this.ty_);

        // Transform the points
        if (frameIdx >= framePoses_.Count)
        {
            return;
        }

        var transformMat = framePoses_[frameIdx].matrix;

        for (int i = 0; i < positionsBuffer.Length; i += 4)
        {
            Vector3 pt;
            pt.x = positionsBuffer[i];
            pt.y = positionsBuffer[i + 1];
            pt.z = positionsBuffer[i + 2];

            Vector3 transformedPt = transformMat.MultiplyPoint3x4(pt);

            positionsBuffer[i] = transformedPt.x;
            positionsBuffer[i + 1] = transformedPt.y;
            positionsBuffer[i + 2] = transformedPt.z;
        }
    }
}