muskie82 / MonoGS

[CVPR'24 Highlight & Best Demo Award] Gaussian Splatting SLAM
https://rmurai.co.uk/projects/GaussianSplattingSLAM/
Other
1.12k stars 97 forks source link

Apply to RGB monocular movie #56

Open RickMaruyama opened 3 months ago

RickMaruyama commented 3 months ago

@muskie82 Thank you very much for your great contribution to visual SLAM!!

I have a question. How can your code apply to RGB monocular movie data? I take a RGB monocular movie, and want to render it as ply format.

【detail】 I have reproduced the code on AWS EC2 (Ubuntu20.4 / A10 GPU) and successfully can get the GUI with demo dataset. (following code) python slam.py --config configs/mono/tum/fr3_office.yaml Then I samely tried to use my own movie data which is Monocular and produce RGB only. I inputted the RGB image and color info as rgb.txt and did same code. then the terminal requests me to give depth.txt.

python slam.py --config configs/mono/tum/my_data.yaml
~~~~
File "/home/ubuntu/MonoGS/utils/dataset.py", line 55, in parse_list
data = np.loadtxt(filepath, delimiter=" ", dtype=np.unicode_, skiprows=skiprows)
File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1067, in loadtxt
fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/numpy/lib/_datasource.py", line 193, in open
return ds.open(path, mode, encoding=encoding, newline=newline)
File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/numpy/lib/_datasource.py", line 533, in open
raise IOError("%s not found." % path)
OSError: datasets/tum/rgb_dataset_chassy_crash_test/depth.txt not found.

I confirmed even depth.txt is added in the right folder and tried again the run slam.py, it still then need “pose_list”, probably related to accelerometer.txt and ground truth.txt. How can I get them just with RGB camera?

python slam.py --config configs/mono/tum/my_data.yaml
MonoGS: saving results in 
results/tum_my_data/2024-04-03-06-50-39
Traceback (most recent call last):
  File "slam.py", line 251, in <module>
    slam = SLAM(config, save_dir=save_dir)
  File "slam.py", line 56, in __init__
    model_params, model_params.source_path, config=config
  File "/home/ubuntu/MonoGS/utils/dataset.py", line 490, in load_dataset
    return TUMDataset(args, path, config)
  File "/home/ubuntu/MonoGS/utils/dataset.py", line 398, in __init__
    parser = TUMParser(dataset_path)
  File "/home/ubuntu/MonoGS/utils/dataset.py", line 51, in __init__
    self.load_poses(self.input_folder, frame_rate=32)
  File "/home/ubuntu/MonoGS/utils/dataset.py", line 88, in load_poses
    pose_data = self.parse_list(pose_list, skiprows=1)
UnboundLocalError: local variable 'pose_list' referenced before assignment

So my question again is what is procedure to test on real data created by Monocular RGB camera? Thank you for your advice in advance.

rmurai0610 commented 3 months ago

To run our code with your custom data, you'll need to implement a dataset class https://github.com/muskie82/MonoGS/blob/main/utils/dataset.py#L394-L402 which sets the following attributes (pseudocode):

self.has_depth = False
self.num_imgs = 100
self.color_paths = ["path/1.png", "path/2.png", ...]
self.depth_paths = None
self.poses = [np.eye(4) for _ in range(self.num_imgs)]

If you have gt poses, you can set the poses to be them.

RickMaruyama commented 3 months ago

Thank you very much for your prompt reply and sorry for my late reply. I successfully could see the rendering scene on GUI. however, somehow, I could not see the ply file output in result of gaussian splatting. I checked and got this info. Probably I could unset the depth value in wrong way.. could you advise me how to reach getting ply? Thank you.


MonoGS/results/tum_rgb_dataset_sample_data/2024-04-04-03-19-38/plot/trj_final.json

{
    "trj_id": [
        80
    ],
    "trj_est": [
        [
            [
                1.0,
                0.0,
                0.0,
                0.0
            ],
            [
                0.0,
                1.0,
                0.0,
                0.0
            ],
            [
                0.0,
                0.0,
                1.0,
                0.0
            ],
            [
                0.0,
                0.0,
                0.0,
                1.0
            ]
        ]
    ],
    "trj_gt": [
        [
            [
                1.0,
                0.0,
                0.0,
                0.0
            ],
            [
                0.0,
                1.0,
                0.0,
                0.0
            ],
            [
                0.0,
                0.0,
                1.0,
                0.0
            ],
            [
                0.0,
                0.0,
                0.0,
                1.0
            ]
        ]
    ]
muskie82 commented 3 months ago

Hi, save_gaussians function exports ply file. To call this function, can you try one of the following options?

Best,

RickMaruyama commented 3 months ago

Thank you for your reply!

I have set eval_rendering = True, then Run with --eval. to exclude any risk occurred from dataset, I use rgb data from freriburg_desk rgb data as well as rgb.txt as input (not to include groundtruth.txt and accelerometer.txt). after running slam.py, I got asked to registered to wangb and did so. then I was guided to see the result in wagb, and no gui opened-up (gaussian-splatting) in my Ubuntu server. (I guess this is caused by spec. of --eval)

I got the error that umeyama_alignment doesn't work and therefore no .ply produced at save_folder. Seems to me the cause is groundtruth setting. Could you tell me how to treat groundtruth value (as well as accelerometer.txt), in case I don't have them produced by monocular camera? (like iPhone movie for example)

I hope this terminal output helps you to understand my problem.

**(MonoGS) ubuntu@ip-XX:~/MonoGS$ python slam.py --config configs/mono/tum/dummy_data.yaml --eval**
MonoGS: Running MonoGS in Evaluation Mode
MonoGS: Following config will be overriden
MonoGS:         save_results=True
MonoGS:         use_gui=False
MonoGS:         eval_rendering=True
MonoGS:         use_wandb=True
MonoGS: saving results in results/tum_rgbd_dataset_freiburg1_desk copy/2024-04-16-03-31-58
wandb: Currently logged in as: rmaruyama7180 (use `wandb login --relogin` to force relogin)
wandb: wandb version 0.16.6 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.12.2
wandb: Syncing run configs/mono/tum/dummy_data_2024-04-16-03-31-58
wandb: ⭐️ View project at https://wandb.ai/rmaruyama7180/MonoGS
wandb:  View run at https://wandb.ai/rmaruyama7180/MonoGS/runs/eukpnf09
wandb: Run data is saved locally in /home/ubuntu/MonoGS/wandb/run-20240416_033159-eukpnf09
wandb: Run `wandb offline` to turn off syncing.

MonoGS: Resetting the system
MonoGS: Initialized map
MonoGS: Resetting the opacity of non-visible Gaussians
MonoGS: Performing initial BA for initialization
MonoGS: Initialized SLAM
MonoGS: Evaluating ATE at frame:  59

Traceback (most recent call last):
  File "slam.py", line 252, in <module>
    slam = SLAM(config, save_dir=save_dir)
  File "slam.py", line 110, in __init__
    self.frontend.run()
  File "/home/ubuntu/MonoGS/utils/slam_frontend.py", line 473, in run
    monocular=self.monocular,
  File "/home/ubuntu/MonoGS/utils/eval_utils.py", line 120, in eval_ate
    monocular=monocular,
  File "/home/ubuntu/MonoGS/utils/eval_utils.py", line 32, in evaluate_evo
    traj_est, traj_ref, correct_scale=monocular
  File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/evo/core/trajectory.py", line 397, in align_trajectory
    with_scale)
  File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/evo/core/geometry.py", line 72, in umeyama_alignment
    raise GeometryException("Degenerate covariance rank, "
evo.core.geometry.GeometryException: Degenerate covariance rank, Umeyama alignment is not possible

wandb: Waiting for W&B process to finish, PID 13615
wandb: Program failed with code 1.  Press ctrl-c to abort syncing.                                                                    
wandb: Find user logs for this run at: /home/ubuntu/MonoGS/wandb/run-20240416_033159-eukpnf09/logs/debug.log
wandb: Find internal logs for this run at: /home/ubuntu/MonoGS/wandb/run-20240416_033159-eukpnf09/logs/debug-internal.log
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Synced configs/mono/tum/dummy_data_2024-04-16-03-31-58: https://wandb.ai/****/MonoGS/runs/eukpnf09

FYI. output with print func for error-related variable value to check the cause of error. It looks like pose_gt is cause of problem, letting y and mean_y to "0"and umeyayama_alignment error.

_pose_gt: [[1. 0. 0. 0.] [0. 1. 0. 0.] [0. 0. 1. 0.] [0. 0. 0. 1.]]
trj_gt_np: [array([[1., 0., 0., 0.],[0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]]), 
array([[1., 0., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]]), 
 ...
array([[1., 0., 0., 0.], 0., 1., 0., 0.], 0., 0., 1., 0.], [0., 0., 0., 1.]])]
poses_gt: [array([[1., 0., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]]), 
 ....
array([[1., 0., 0., 0.],       [0., 1., 0., 0.],       [0., 0., 1., 0.],       [0., 0., 0., 1.]])]
 traj_ref = PosePath3D: 10 poses, 0.000m path length
traj_ref: 10 poses, 0.000m path length
y: [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
mean_y: [0. 0. 0.]
mean_x: [-0.00533411 -0.04845559 -0.25690131]
n: 10
outer_sum [[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]
cov_xy: [[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]
m-1 rank: 2
SVDv - d: [0. 0. 0.]
threshold - np.finfo(d.dtype).eps: 2.220446049250313e-16_
muskie82 commented 2 months ago

The easiest fix should be just to comment out eval_ate function.

https://github.com/muskie82/MonoGS/blob/37cebb4b8df24777980eac00276c07059b94762c/utils/slam_frontend.py#L349-L356

RickMaruyama commented 2 months ago

Thanks for reply. After comment out evalate at frontend.py, still same error about umeyama persists..

muskie82 commented 2 months ago

Of course here as well. The idea is just to skip all the trajectory evaluation which requires gt pose.

https://github.com/muskie82/MonoGS/blob/37cebb4b8df24777980eac00276c07059b94762c/utils/slam_frontend.py#L468-L474

RickMaruyama commented 2 months ago

@muskie82 Yes, I've comment it out too. Sorry for not to mention above.

muskie82 commented 2 months ago

OK, then where the umeyama alignment is called? It is used just for trajectory evaluation, so just comment out all the evaluate_ate function in the code.

RickMaruyama commented 2 months ago

it's here. /MonoGS/lib/python3.7/site-packages/evo/core/geometry.py line 64, in umeyama_alignment

muskie82 commented 2 months ago

I mean, where in the MonoGS code it is called.

langsungn commented 2 months ago

@RickMaruyama Hi, can I ask how you succeeded in viewing the rendering GUI and implementing dataset.py? Thanks :)

RickMaruyama commented 2 months ago

@langsungn Hello! I'm sorry for late feedback, I could get ply file but the plots clouds seems to be broken. There maybe any issue in my modification..

langsungn commented 2 months ago

@RickMaruyama Thank you for replying. I'm okay with a few issues, but would you share your modified 'dataset.py' file with me? Thanks a lot!

leichaoyang commented 1 month ago

@RickMaruyama Can you share your modified 'dataset.py' file with me?Thanks !

RickMaruyama commented 1 month ago

@langsungn @leichaoyang I am sorry for my delay, since I was very busy for last days.. I attach dataset.py modified. Thank you for your support!


import csv
import glob
import os

import cv2
import numpy as np
import torch
import trimesh
from PIL import Image

from gaussian_splatting.utils.graphics_utils import focal2fov

try:
    import pyrealsense2 as rs
except Exception:
    pass

class ReplicaParser:
    def __init__(self, input_folder):
        self.input_folder = input_folder
        self.color_paths = sorted(glob.glob(f"{self.input_folder}/results/frame*.jpg"))
        #self.depth_paths = sorted(glob.glob(f"{self.input_folder}/results/depth*.png")) # RGBカメラ使用のためコメントアウト
        self.depth_paths = None # RGBカメラ使用のため追加
        self.n_img = len(self.color_paths)
        self.load_poses(f"{self.input_folder}/traj.txt")

    def load_poses(self, path):
        # self.poses = [] # RGBカメラ使用のためコメントアウト
        self.poses = [np.eye(4) for _ in range(self.n_img)]  # RGBカメラ使用のため追加

        with open(path, "r") as f:
            lines = f.readlines()

        frames = []
        for i in range(self.n_img):
            line = lines[i]
            pose = np.array(list(map(float, line.split()))).reshape(4, 4)
            pose = np.linalg.inv(pose)
            self.poses.append(pose)
            frame = {
                "file_path": self.color_paths[i],
                "depth_path": self.depth_paths[i],
                "transform_matrix": pose.tolist(),
            }

            frames.append(frame)
        self.frames = frames

class TUMParser:
    def __init__(self, input_folder):
        self.input_folder = input_folder
        self.load_poses(self.input_folder, frame_rate=32)
        self.n_img = len(self.color_paths)

    def parse_list(self, filepath, skiprows=0):
        data = np.loadtxt(filepath, delimiter=" ", dtype=np.unicode_, skiprows=skiprows)
        return data

    # def associate_frames(self, tstamp_image, tstamp_depth, tstamp_pose, max_dt=0.08):
    #     associations = []
    #     for i, t in enumerate(tstamp_image):
    #         if tstamp_pose is None:
    #             j = np.argmin(np.abs(tstamp_depth - t))
    #             if np.abs(tstamp_depth[j] - t) < max_dt:
    #                 associations.append((i, j))
    #         else:
    #             j = np.argmin(np.abs(tstamp_depth - t))
    #             k = np.argmin(np.abs(tstamp_pose - t))
    #             if (np.abs(tstamp_depth[j] - t) < max_dt) and (
    #                 np.abs(tstamp_pose[k] - t) < max_dt
    #             ):
    #                 associations.append((i, j, k))
    #     return associations

    def associate_frames(self, tstamp_image, max_dt=0.08):
        associations = [(i, i, i) for i in range(len(tstamp_image))]  # 画像のインデックスのみを持つタプルのリストを作成
        #print("associations:", associations)
        return associations

    def load_poses(self, datapath, frame_rate=-1):
        if os.path.isfile(os.path.join(datapath, "groundtruth.txt")):
            pose_list = os.path.join(datapath, "groundtruth.txt")
        # elif os.path.isfile(os.path.join(datapath, "pose.txt")): # コメントアウト
        #     pose_list = os.path.join(datapath, "pose.txt") # コメントアウト

        image_list = os.path.join(datapath, "rgb.txt")
        # depth_list = os.path.join(datapath, "depth.txt") # コメントアウト

        image_data = self.parse_list(image_list)
        # depth_data = self.parse_list(depth_list) # コメントアウト
        # pose_data = self.parse_list(pose_list, skiprows=1) # コメントアウト
        # pose_vecs = pose_data[:, 0:].astype(np.float64) # コメントアウト

        tstamp_image = image_data[:, 0].astype(np.float64)
        # tstamp_depth = depth_data[:, 0].astype(np.float64) # コメントアウト
        # tstamp_pose = pose_data[:, 0].astype(np.float64) # コメントアウト
        associations = self.associate_frames(tstamp_image) # コメントアウト

        indicies = [0]
        for i in range(1, len(associations)):
            t0 = tstamp_image[associations[indicies[-1]][0]]
            t1 = tstamp_image[associations[i][0]]
            if t1 - t0 > 1.0 / frame_rate:
                indicies += [i]

        self.color_paths, self.poses, self.depth_paths, self.frames = [], [], [], []

        for ix in indicies:
            (i, j, k) = associations[ix]
            self.color_paths += [os.path.join(datapath, image_data[i, 1])]
            # self.depth_paths += [os.path.join(datapath, depth_data[j, 1])] # コメントアウト

            # quat = pose_vecs[k][4:] # コメントアウト
            # trans = pose_vecs[k][1:4] # コメントアウト
            # T = trimesh.transformations.quaternion_matrix(np.roll(quat, 1)) # コメントアウト
            # T[:3, 3] = trans # コメントアウト
            # self.poses += [np.linalg.inv(T)] # コメントアウト

            self.poses += [np.eye(4)] # 代わりに単位行列を追加

            frame = {
                "file_path": str(os.path.join(datapath, image_data[i, 1])),
                # "depth_path": str(os.path.join(datapath, depth_data[j, 1])), # コメントアウト
                # "transform_matrix": (np.linalg.inv(T)).tolist(), # コメントアウト
            }

            self.frames.append(frame)

class EuRoCParser:
    def __init__(self, input_folder, start_idx=0):
        self.input_folder = input_folder
        self.start_idx = start_idx
        self.color_paths = sorted(
            glob.glob(f"{self.input_folder}/mav0/cam0/data/*.png")
        )
        self.color_paths_r = sorted(
            glob.glob(f"{self.input_folder}/mav0/cam1/data/*.png")
        )
        assert len(self.color_paths) == len(self.color_paths_r)
        self.color_paths = self.color_paths[start_idx:]
        self.color_paths_r = self.color_paths_r[start_idx:]
        self.n_img = len(self.color_paths)
        self.load_poses(
            f"{self.input_folder}/mav0/state_groundtruth_estimate0/data.csv"
        )

    def associate(self, ts_pose):
        pose_indices = []
        for i in range(self.n_img):
            color_ts = float((self.color_paths[i].split("/")[-1]).split(".")[0])
            k = np.argmin(np.abs(ts_pose - color_ts))
            pose_indices.append(k)

        return pose_indices

    def load_poses(self, path):
        self.poses = []
        with open(path) as f:
            reader = csv.reader(f)
            header = next(reader)
            data = [list(map(float, row)) for row in reader]
        data = np.array(data)
        T_i_c0 = np.array(
            [
                [0.0148655429818, -0.999880929698, 0.00414029679422, -0.0216401454975],
                [0.999557249008, 0.0149672133247, 0.025715529948, -0.064676986768],
                [-0.0257744366974, 0.00375618835797, 0.999660727178, 0.00981073058949],
                [0.0, 0.0, 0.0, 1.0],
            ]
        )

        pose_ts = data[:, 0]
        pose_indices = self.associate(pose_ts)

        frames = []
        for i in range(self.n_img):
            trans = data[pose_indices[i], 1:4]
            quat = data[pose_indices[i], 4:8]

            T_w_i = trimesh.transformations.quaternion_matrix(np.roll(quat, 1))
            T_w_i[:3, 3] = trans
            T_w_c = np.dot(T_w_i, T_i_c0)

            self.poses += [np.linalg.inv(T_w_c)]

            frame = {
                "file_path": self.color_paths[i],
                "transform_matrix": (np.linalg.inv(T_w_c)).tolist(),
            }

            frames.append(frame)
        self.frames = frames

class BaseDataset(torch.utils.data.Dataset):
    def __init__(self, args, path, config):
        self.args = args
        self.path = path
        self.config = config
        self.device = "cuda:0"
        self.dtype = torch.float32
        self.num_imgs = 999999

    def __len__(self):
        return self.num_imgs

    def __getitem__(self, idx):
        pass

class MonocularDataset(BaseDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        calibration = config["Dataset"]["Calibration"]
        # Camera prameters
        self.fx = calibration["fx"]
        self.fy = calibration["fy"]
        self.cx = calibration["cx"]
        self.cy = calibration["cy"]
        self.width = calibration["width"]
        self.height = calibration["height"]
        self.fovx = focal2fov(self.fx, self.width)
        self.fovy = focal2fov(self.fy, self.height)
        self.K = np.array(
            [[self.fx, 0.0, self.cx], [0.0, self.fy, self.cy], [0.0, 0.0, 1.0]]
        )
        # distortion parameters
        self.disorted = calibration["distorted"]
        self.dist_coeffs = np.array(
            [
                calibration["k1"],
                calibration["k2"],
                calibration["p1"],
                calibration["p2"],
                calibration["k3"],
            ]
        )
        self.map1x, self.map1y = cv2.initUndistortRectifyMap(
            self.K,
            self.dist_coeffs,
            np.eye(3),
            self.K,
            (self.width, self.height),
            cv2.CV_32FC1,
        )
        # depth parameters
        # self.has_depth = True if "depth_scale" in calibration.keys() else False # RGBカメラ使用のためコメントアウト
        self.has_depth = False # RGBカメラ使用のため追加
        self.depth_scale = calibration["depth_scale"] if self.has_depth else None

        # Default scene scale
        nerf_normalization_radius = 5
        self.scene_info = {
            "nerf_normalization": {
                "radius": nerf_normalization_radius,
                "translation": np.zeros(3),
            },
        }

    def __getitem__(self, idx):
        color_path = self.color_paths[idx]
        pose = self.poses[idx]

        image = np.array(Image.open(color_path))
        depth = None

        if self.disorted:
            image = cv2.remap(image, self.map1x, self.map1y, cv2.INTER_LINEAR)

        if self.has_depth:
            depth_path = self.depth_paths[idx]
            depth = np.array(Image.open(depth_path)) / self.depth_scale

        image = (
            torch.from_numpy(image / 255.0)
            .clamp(0.0, 1.0)
            .permute(2, 0, 1)
            .to(device=self.device, dtype=self.dtype)
        )
        pose = torch.from_numpy(pose).to(device=self.device)
        return image, depth, pose

class StereoDataset(BaseDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        calibration = config["Dataset"]["Calibration"]
        self.width = calibration["width"]
        self.height = calibration["height"]

        cam0raw = calibration["cam0"]["raw"]
        cam0opt = calibration["cam0"]["opt"]
        cam1raw = calibration["cam1"]["raw"]
        cam1opt = calibration["cam1"]["opt"]
        # Camera prameters
        self.fx_raw = cam0raw["fx"]
        self.fy_raw = cam0raw["fy"]
        self.cx_raw = cam0raw["cx"]
        self.cy_raw = cam0raw["cy"]
        self.fx = cam0opt["fx"]
        self.fy = cam0opt["fy"]
        self.cx = cam0opt["cx"]
        self.cy = cam0opt["cy"]

        self.fx_raw_r = cam1raw["fx"]
        self.fy_raw_r = cam1raw["fy"]
        self.cx_raw_r = cam1raw["cx"]
        self.cy_raw_r = cam1raw["cy"]
        self.fx_r = cam1opt["fx"]
        self.fy_r = cam1opt["fy"]
        self.cx_r = cam1opt["cx"]
        self.cy_r = cam1opt["cy"]

        self.fovx = focal2fov(self.fx, self.width)
        self.fovy = focal2fov(self.fy, self.height)
        self.K_raw = np.array(
            [
                [self.fx_raw, 0.0, self.cx_raw],
                [0.0, self.fy_raw, self.cy_raw],
                [0.0, 0.0, 1.0],
            ]
        )

        self.K = np.array(
            [[self.fx, 0.0, self.cx], [0.0, self.fy, self.cy], [0.0, 0.0, 1.0]]
        )

        self.Rmat = np.array(calibration["cam0"]["R"]["data"]).reshape(3, 3)
        self.K_raw_r = np.array(
            [
                [self.fx_raw_r, 0.0, self.cx_raw_r],
                [0.0, self.fy_raw_r, self.cy_raw_r],
                [0.0, 0.0, 1.0],
            ]
        )

        self.K_r = np.array(
            [[self.fx_r, 0.0, self.cx_r], [0.0, self.fy_r, self.cy_r], [0.0, 0.0, 1.0]]
        )
        self.Rmat_r = np.array(calibration["cam1"]["R"]["data"]).reshape(3, 3)

        # distortion parameters
        self.disorted = calibration["distorted"]
        self.dist_coeffs = np.array(
            [cam0raw["k1"], cam0raw["k2"], cam0raw["p1"], cam0raw["p2"], cam0raw["k3"]]
        )
        self.map1x, self.map1y = cv2.initUndistortRectifyMap(
            self.K_raw,
            self.dist_coeffs,
            self.Rmat,
            self.K,
            (self.width, self.height),
            cv2.CV_32FC1,
        )

        self.dist_coeffs_r = np.array(
            [cam1raw["k1"], cam1raw["k2"], cam1raw["p1"], cam1raw["p2"], cam1raw["k3"]]
        )
        self.map1x_r, self.map1y_r = cv2.initUndistortRectifyMap(
            self.K_raw_r,
            self.dist_coeffs_r,
            self.Rmat_r,
            self.K_r,
            (self.width, self.height),
            cv2.CV_32FC1,
        )

    def __getitem__(self, idx):
        color_path = self.color_paths[idx]
        color_path_r = self.color_paths_r[idx]

        pose = self.poses[idx]
        image = cv2.imread(color_path, 0)
        image_r = cv2.imread(color_path_r, 0)
        depth = None
        if self.disorted:
            image = cv2.remap(image, self.map1x, self.map1y, cv2.INTER_LINEAR)
            image_r = cv2.remap(image_r, self.map1x_r, self.map1y_r, cv2.INTER_LINEAR)
        stereo = cv2.StereoSGBM_create(minDisparity=0, numDisparities=64, blockSize=20)
        stereo.setUniquenessRatio(40)
        disparity = stereo.compute(image, image_r) / 16.0
        disparity[disparity == 0] = 1e10
        depth = 47.90639384423901 / (
            disparity
        )  ## Following ORB-SLAM2 config, baseline*fx
        depth[depth < 0] = 0
        image = cv2.cvtColor(image, cv2.COLOR_GRAY2BGR)
        image = (
            torch.from_numpy(image / 255.0)
            .clamp(0.0, 1.0)
            .permute(2, 0, 1)
            .to(device=self.device, dtype=self.dtype)
        )
        pose = torch.from_numpy(pose).to(device=self.device)

        return image, depth, pose

class TUMDataset(MonocularDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        dataset_path = config["Dataset"]["dataset_path"]
        parser = TUMParser(dataset_path)
        self.num_imgs = parser.n_img
        self.color_paths = parser.color_paths
        self.depth_paths = parser.depth_paths
        self.poses = parser.poses

class ReplicaDataset(MonocularDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        dataset_path = config["Dataset"]["dataset_path"]
        parser = ReplicaParser(dataset_path)
        self.num_imgs = parser.n_img
        self.color_paths = parser.color_paths
        self.depth_paths = parser.depth_paths
        self.poses = parser.poses

class EurocDataset(StereoDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        dataset_path = config["Dataset"]["dataset_path"]
        parser = EuRoCParser(dataset_path, start_idx=config["Dataset"]["start_idx"])
        self.num_imgs = parser.n_img
        self.color_paths = parser.color_paths
        self.color_paths_r = parser.color_paths_r
        self.poses = parser.poses

class RealsenseDataset(BaseDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        self.pipeline = rs.pipeline()
        self.h, self.w = 720, 1280
        self.config = rs.config()
        self.config.enable_stream(rs.stream.color, self.w, self.h, rs.format.bgr8, 30)
        self.profile = self.pipeline.start(self.config)

        self.rgb_sensor = self.profile.get_device().query_sensors()[1]
        self.rgb_sensor.set_option(rs.option.enable_auto_exposure, False)
        # rgb_sensor.set_option(rs.option.enable_auto_white_balance, True)
        self.rgb_sensor.set_option(rs.option.enable_auto_white_balance, False)
        self.rgb_sensor.set_option(rs.option.exposure, 200)
        self.rgb_profile = rs.video_stream_profile(
            self.profile.get_stream(rs.stream.color)
        )

        self.rgb_intrinsics = self.rgb_profile.get_intrinsics()

        self.fx = self.rgb_intrinsics.fx
        self.fy = self.rgb_intrinsics.fy
        self.cx = self.rgb_intrinsics.ppx
        self.cy = self.rgb_intrinsics.ppy
        self.width = self.rgb_intrinsics.width
        self.height = self.rgb_intrinsics.height
        self.fovx = focal2fov(self.fx, self.width)
        self.fovy = focal2fov(self.fy, self.height)
        self.K = np.array(
            [[self.fx, 0.0, self.cx], [0.0, self.fy, self.cy], [0.0, 0.0, 1.0]]
        )

        self.disorted = True
        self.dist_coeffs = np.asarray(self.rgb_intrinsics.coeffs)
        self.map1x, self.map1y = cv2.initUndistortRectifyMap(
            self.K, self.dist_coeffs, np.eye(3), self.K, (self.w, self.h), cv2.CV_32FC1
        )

        # depth parameters
        self.has_depth = False
        self.depth_scale = None

    def __getitem__(self, idx):
        pose = torch.eye(4, device=self.device, dtype=self.dtype)

        frameset = self.pipeline.wait_for_frames()
        rgb_frame = frameset.get_color_frame()
        image = np.asanyarray(rgb_frame.get_data())
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        if self.disorted:
            image = cv2.remap(image, self.map1x, self.map1y, cv2.INTER_LINEAR)

        image = (
            torch.from_numpy(image / 255.0)
            .clamp(0.0, 1.0)
            .permute(2, 0, 1)
            .to(device=self.device, dtype=self.dtype)
        )
        return image, None, pose

def load_dataset(args, path, config):
    if config["Dataset"]["type"] == "tum":
        return TUMDataset(args, path, config)
    elif config["Dataset"]["type"] == "replica":
        return ReplicaDataset(args, path, config)
    elif config["Dataset"]["type"] == "euroc":
        return EurocDataset(args, path, config)
    elif config["Dataset"]["type"] == "realsense":
        return RealsenseDataset(args, path, config)
    else:
        raise ValueError("Unknown dataset type")
Harshitavardhani commented 1 month ago

@muskie82 Yes, I've comment it out too. Sorry for not to mention above.

Hi @muskie82

Did you overcome the error regarding "Umeyama alignment is not possible"? I am facing the same issue while working with a custom dataset.

If so, could you please tell me how you resolved it? Thank you!

RickMaruyama commented 1 month ago

@langsungn Hello! I'd like to solve this issue. Do you have dataset.py for monocular camera dataset case, which successfully worked to get ply? if so, I'd like to see it for my reference.

jdeng21 commented 2 weeks ago

@muskie82 @RickMaruyama @Harshitavardhani Hello! I am facing the same issue regarding "Degenerate covariance rank, Umeyama alignment is not possible". Did you solve the problem? If so, can you give me some advice? Thank you very much!