Any documentation on the depth frame format?

andybak commented 3 years ago

I've exported a recording in the native r3d format and I'm attempting to read the depth data

>>> pth = 'winhome/Documents/3D Scans/2020-10-28--15-01-03/rgbd/1.depth' >>> fh = open(pth, "rb") >>> compressed = fh.read() >>> decompressed = liblzfse.decompress(compressed)

But then I'm not sure what to do with the decompressed data. Is it just a case of reading each 4 bytes, unpacking them to a single precision float? The jpgs are 192x256 and doing the maths on that seems to add up: 192 x 256 x 4 = 196608 and len(decompressed) gives me 196608.

So this looks right:

>>> f = [struct.unpack('f', d[x:x+4]) for x in range(0,len(d),4)]

Then I guess I can just write f into any image format that supports floating point (.hdr or .exr maybe)

Am I on the right lines? Are the values linear distances from the camera?

If so - it would be nice to add this to the docs.

andybak commented 3 years ago

Follow up question - what are the .conf files for? Are there some docs on this I've overlooked?

marek-simonik commented 3 years ago

Hello Andy, yes, you are right. See this simple example of how to load a .depth file:

import numpy as np
import cv2
import liblzfse  # https://pypi.org/project/pyliblzfse/

def load_depth(filepath):
    with open(filepath, 'rb') as depth_fh:
        raw_bytes = depth_fh.read()
        decompressed_bytes = liblzfse.decompress(raw_bytes)
        depth_img = np.frombuffer(decompressed_bytes, dtype=np.float32)

    depth_img = depth_img.reshape((640, 480))  # For a FaceID camera 3D Video
    # depth_img = depth_img.reshape((256, 192))  # For a LiDAR 3D Video

    return depth_img

if __name__ == '__main__':
    depth_filepath = '/tmp/depth_0.lzfse'
    depth_img = load_depth(depth_filepath)

    cv2.imshow('Depth', depth_img)
    cv2.waitKey(0)

As you wrote, the decompressed .depth file is just a buffer of raw float32 depth bytes (each float32 value is a depth value in meters). There are 49 152 (i.e. 192×256) values for a LiDAR frame and 307 200 (i.e. 480×640) values for a FaceID frame.

The .conf files contain confidence map for each frame, which is of the same size as the depth map and for each pixel of the depth map it contains an uint8 number in the range 0-2, which suggest the confidence that the sensed LiDAR depth is "correct". In other words, it is a measure of depth data quality.

I think this answers your question, so I am closing this issue, but feel free to ask follow-up questions.

andybak commented 3 years ago

(thanks! the above was really helpful for me. However - will anyone else find it easily as it is in a closed github issue? Part of my reason for opening this was to suggest that something like the above would be a great addition to the docs)

marek-simonik commented 3 years ago

You are right, thanks for reminding me that. I added link to this Issue into the Wiki.

wolterlw commented 3 years ago

Had the same confusion and found this issue before going to the Wiki. It would be very helpful to add some mention of it into the main Readme

Also on a related note - is it possible to get distance in meters from an exported RGBD video?

marek-simonik commented 3 years ago

OK, I will mention the Wiki in the Readme the next time I will push an update.

As for getting the distance in meters from exported RGBD videos: yes, it is possible. I described how to do it in the Readme of this demo.

wolterlw commented 3 years ago

got it, thank you for a great app and library! please do add landscape mode for the iPad someday though

marek-simonik commented 3 years ago

Thank you for the suggestion, noted! I will include landscape mode in a future update.

zehuiz2 commented 3 years ago

Two follow-up questions:

In 'How to use?', you wrote 'JSON config file (containing the intrinsic matrix, FPS, and width/height of the RGBD frames)'. Where could I find this info?
Does 2 mean high confidence or 0?

marek-simonik commented 3 years ago

To answer your questions:

After you unzip an exported .r3d file, you will see a metadata file. This is the JSON config file.
In my understanding, 2 is high confidence, 1 is "lower" confidence and 0 is the lowest confidence.

zehuiz2 commented 2 years ago

Hi, I wonder if you've updated both LiDAR & FaceID depth resolution?

depth_img = depth_img.reshape((1280, 960)) # For a FaceID camera 3D Video depth_img = depth_img.reshape((512, 384)) # For a LiDAR 3D Video

Is the above correct? Another three questions:

Is it possible you could update the FaceID RGB resolution? It is still 640*480.
It seems the LiDAR confidence resolution is not updated?
What are the units of the depth measurements? I assume it is mm?

Thank you very much!

knsjoon commented 2 years ago

Addition on this issue regarding Apple ARKIT depth confidence map:

From https://developer.apple.com/documentation/arkit/arconfidencelevel, there are only three levels of confidence map values.

case low Depth-value accuracy in which the framework is less confident.
case medium Depth-value accuracy in which the framework is moderately confident.
case high Depth-value accuracy in which the framework is fairly confident.

Hope it helps future users to understand why confidence map is only consist of 0, 1, 2 !

marek-simonik / record3d

Any documentation on the depth frame format? #7