What directory does the "--exp_path" parameter refer to?

GottenZZP commented 3 months ago

Thank you very much for open-sourcing this project, but I have encountered an issue now. I would like to ask, what directory does the "--exp_path" parameter refer to? Currently, I can train Gaussian Splatting normally. However, when I try to execute the pose estimation according to the python3 pretrain_eval_attention.py --exp_path ../pose-splatting/output/ --out_path results.json --data_type tankstemple line of your markdown, it prompts me with the error FileNotFoundError: [Errno 2] No such file or directory: '../pose-splatting/output/. So, I modified the path of this parameter to the output directory where my Gaussian Splatting training is completed, which includes the point cloud files for 7000 and 30000 steps. After that, when I execute the above code, it can output the 'results' file normally, but the file is empty. Therefore, I would like to ask, what directory does this "--exp_path" parameter refer to?

mbortolon97 commented 3 months ago

Thanks for pointing out the mistake. I updated the README. After the training, you should have a directory ./output/ inside which all your models should be trained (the subdirectories of it should be tt_Caterpillar, tt_Family, tt_Ignatius, etc.). --exp_path refers to this directory. Automatically based on data_type, the script will train for each of the objects of that datatype inside the directory.

GottenZZP commented 3 months ago

Okay, thank you, now the code runs successfully! But I'm a bit confused now, how can I apply this code to my own dataset? For example, what should I do to input an image and its corresponding Gaussian point cloud file, and then have the code output the camera pose of the given image on the corresponding Gaussian point cloud?

mbortolon97 commented 3 months ago

I’m glad to hear that the code ran successfully. Converting your data to the Mip-360 format is the most straightforward approach to training the model on new data. This format is identical to the COLMAP output. By running COLMAP, you should generate the data already in the Mip-360 format. Then you use the canonical format to train the model with the COLMAP output train.py -s [colmap output]. This will allow us to train the 6DGS model. Once you trained the 6DGS, you can predict the camera pose of a new image by calling the test_pose_estimation function from pose_estimation/test.py. I would love to prepare a demo for you, but unfortunately, I won’t be able to do anything more than answer the issues until December due to time constraints.

On Mon, Jul 29, 2024 at 3:02 PM Zhang Zepeng @.***> wrote:

Okay, thank you, now the code runs successfully! But I'm a bit confused now, how can I apply this code to my own dataset? For example, what should I do to input an image and its corresponding Gaussian point cloud file, and then have the code output the camera pose of the given image on the corresponding Gaussian point cloud?

— Reply to this email directly, view it on GitHub https://github.com/mbortolon97/6dgs/issues/3#issuecomment-2256033724, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGCAED6XKERK3CDXU23H5ETZOZDQTAVCNFSM6AAAAABLT3DXCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJWGAZTGNZSGQ . You are receiving this because you commented.Message ID: @.***>

GottenZZP commented 3 months ago

Thank you very much for your response. I have carefully read through your code, and I see that the function test_pose_estimation requires the input of the camera information cameras_info for the photos to be predicted. It then calculates the w2c transformation matrix for each photo. However, I noticed that the composition of the w2c transformation matrix requires the rotation matrix R and translation vector T of the photo, which are derived from the camera extrinsics of that photo. These extrinsics, in turn, are obtained from colmap for each photo used in the 3D reconstruction. As I am just starting out in computer vision, I am somewhat unclear on how, if I input a photo from a new viewpoint that did not participate in the colmap 3D reconstruction and thus lacks camera extrinsics, I should obtain the R and T matrices for this photo in order to estimate the camera pose for the new viewpoint. Thank you once again for your response!

mbortolon97 commented 3 months ago

The R and T need to be provided only if you want to evaluate how much the estimated pose differs from the GT. I did not test this code, but this can be a good sample of how to use test_pose_estimation without any R / T from GT:

import math
import numpy as np
from pose_estimation.file_utils import get_checkpoint_arguments
from pose_estimation.identification_module import IdentificationModule
from pose_estimation.test import test_pose_estimation
from scene.colmap_utils import Image
from scene.scene_structure import CameraInfo
import torch
from pretrain_eval_attention import explore_model, load_model
import os.path as op

from utils.graphics_utils import focal2fov
import argparse

@torch.no_grad()
def main(exp_dir_filepath, image_path, device='cuda', white_background=False):
    checkpoint_args = get_checkpoint_arguments(exp_dir_filepath)
    checkpoint_filepath = op.join(exp_dir_filepath, "point_cloud", "iteration_30000", "id_module.th")

    gs_model = load_model(
        checkpoint_filepath, device, sh_degrees=checkpoint_args.sh_degree
    )

    backbone_type = "dino"
    id_module = (
        IdentificationModule(backbone_type=backbone_type)
        .eval()
        .to(device, non_blocking=True)
    )

    start_iterations = 0
    id_module_ckpt_path = op.join(exp_dir_filepath, "id_module.th")
    if op.exists(id_module_ckpt_path):
        print("Checkpoint already exist, skip training phase")
        ckpt_dict = torch.load(id_module_ckpt_path, map_location=device)
        id_module.load_state_dict(ckpt_dict["model_state_dict"])
        start_iterations = ckpt_dict["epoch"]

    test_cameras = []
    image = Image.open(image_path)

    im_data = np.array(image.convert("RGBA"))

    bg = np.array([1, 1, 1]) if white_background else np.array([0, 0, 0])

    norm_data = im_data / 255.0
    arr = norm_data[:, :, :3] * norm_data[:, :, 3:4] + bg * (1 - norm_data[:, :, 3:4])
    image = Image.fromarray(np.array(arr * 255.0, dtype=np.byte), "RGB")

    test_cameras.append(
        CameraInfo(
            uid=0,
            R=np.eye(3),
            T=np.zeros((3,), dtype=np.float32),
            FovY=math.pi/2,
            FovX=math.pi/2,
            image=image,
            image_path=image_path,
            image_name=op.splitext(op.basename(image_path))[0],
            width=image.size[0],
            height=image.size[1],
        )
    )

    rays_ori, rays_dirs, rays_rgb = explore_model(gs_model)

    print("Loading complete starting the test...")

    (
        results,
        _,
        _,
        _,
        _,
    ) = test_pose_estimation(
        test_cameras,
        id_module,
        rays_ori,
        rays_dirs,
        rays_rgb,
        torch.tensor([0., 0., 0.], device=device, dtype=torch.float32),
        sequence_id='',
        category_id='',
        loss_fn=None,
    )

    print(results)

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Pose Estimation CLI')
    parser.add_argument('--exp_dir_filepath', type=str, help='Experiment directory filepath')
    parser.add_argument('--image_path', type=str, help='Path to the image')
    parser.add_argument('--device', type=str, default='cuda', help='Device to run the code on')
    parser.add_argument('--white_background', action='store_true', help='Use white background')

    args = parser.parse_args()

    main(args.exp_dir_filepath, args.image_path, args.device, args.white_background)

GottenZZP commented 3 months ago

I am truly grateful, I made a mistake. The R and T matrices are what we need to calculate, not what needs to be input beforehand. I got confused before, but now I understand.

mbortolon97 / 6dgs

What directory does the "--exp_path" parameter refer to? #3