marek-simonik / record3d

Accompanying library for the Record3D iOS app (https://record3d.app/). Allows you to receive RGBD stream from iOS devices with TrueDepth camera(s).
https://record3d.app/
GNU Lesser General Public License v2.1
379 stars 55 forks source link

RGBD Image issue #86

Closed vlasu19 closed 1 month ago

vlasu19 commented 1 month ago

Hi! When I read the depth image and confidence image output from the r3d file, this happens, what is the reason for this and what is the solution. Am I doing some wrong in my apps settings? QQ20240708-130950 The code to read the images is below:

    def load_depth(self, filepath):
        with self._path.open(filepath, "r") as depth_fh:
            raw_bytes = depth_fh.read()
            decompressed_bytes = liblzfse.decompress(raw_bytes)
            depth_img: np.ndarray = np.frombuffer(decompressed_bytes, dtype=np.float32)
        if depth_img.shape[0] == 960 * 720:
            depth_img = depth_img.reshape((960, 720))  # For a FaceID camera 3D Video
        else:
            depth_img = depth_img.reshape((256, 192))  # For a LiDAR 3D Video
        return depth_img

    def load_conf(self, filepath):
        with self._path.open(filepath, "r") as depth_fh:
            raw_bytes = depth_fh.read()
            decompressed_bytes = liblzfse.decompress(raw_bytes)
            depth_img = np.frombuffer(decompressed_bytes, dtype=np.uint8)
        if depth_img.shape[0] == 960 * 720:
            depth_img = depth_img.reshape((960, 720))  # For a FaceID camera 3D Video
        else:
            depth_img = depth_img.reshape((256, 192))  # For a LiDAR 3D Video
        return depth_img
marek-simonik commented 1 month ago

Hi,

the issue is that you assume the depth and the confidence images have a certain hardcoded resolution (the RGB image is landscape, whereas the depth and conf. images are incorrectly portrait in the screenshot you shared).

Since Record3D 1.10, landscape video recording is possible, so the code you use to read the depth and confidence images is incorrect (it does not enumerate all possible combinations of resolutions, and it even assumes incorrect FaceID resolution. Moreover, confidence files are only available for LiDAR video.).

Instead of hardcoding the resolutions, I highly recommend to parse the metadata JSON file in your script; that file is stored in every r3d file. The width and height of the depth and confidence images (both have identical resolution) is stored in the JSON file under the keys dw and dh. The key cameraType contains either 0 if the video was recorded with the selfie FaceID camera or 1 for LiDAR videos.

vlasu19 commented 1 month ago

I've changed the hardcoded resolution issue by parsing the metadata, but the issue still exists.

self.depth_width = metadata_dict['dw']
self.depth_height = metadata_dict['dh']
depth_img = depth_img.reshape((self.depth_width, self.depth_height))

To align the depth image with the rgb image, I used this code process. This works on an older version of the r3d file (08/2022)

def _reshape_all_depth_and_conf(self):
        for index in tqdm.trange(len(self.poses), desc="Upscaling depth and conf"):
            depth_image = self._depth_images[index]
            # Upscale depth image.
            pil_img = Image.fromarray(depth_image)
            reshaped_img = pil_img.resize((self.rgb_width, self.rgb_height))
            reshaped_img = np.asarray(reshaped_img)
            self._reshaped_depth.append(reshaped_img)

            # Upscale confidence as well
            confidence = self._confidences[index]
            conf_img = Image.fromarray(confidence)
            reshaped_conf = conf_img.resize((self.rgb_width, self.rgb_height))
            reshaped_conf = np.asarray(reshaped_conf)
            self._reshaped_conf.append(reshaped_conf)
vlasu19 commented 1 month ago

I solved the problem, and the reason that the depth image and confidence image appeared to be different from the rgb direction was because of a wrong reshape of width and height. It was strange, but the width and height reshape in the code should be correct. Anyway, the problem was solved after I switched them. Thanks for you help.

depth_img = depth_img.reshape((self.depth_height, self.depth_width))

图片