Closed GottenZZP closed 3 months ago
Thanks for pointing out the mistake. I updated the README. After the training, you should have a directory ./output/
inside which all your models should be trained (the subdirectories of it should be tt_Caterpillar, tt_Family, tt_Ignatius, etc.). --exp_path
refers to this directory. Automatically based on data_type
, the script will train for each of the objects of that datatype inside the directory.
Okay, thank you, now the code runs successfully! But I'm a bit confused now, how can I apply this code to my own dataset? For example, what should I do to input an image and its corresponding Gaussian point cloud file, and then have the code output the camera pose of the given image on the corresponding Gaussian point cloud?
I’m glad to hear that the code ran successfully.
Converting your data to the Mip-360 format is the most straightforward
approach to training the model on new data. This format is identical to the
COLMAP output. By running COLMAP, you should generate the data already in
the Mip-360 format. Then you use the canonical format to train the model
with the COLMAP output train.py -s [colmap output]
. This will allow us to
train the 6DGS model. Once you trained the 6DGS, you can predict the camera
pose of a new image by calling the test_pose_estimation
function from
pose_estimation/test.py
.
I would love to prepare a demo for you, but unfortunately, I won’t be able
to do anything more than answer the issues until December due to time
constraints.
On Mon, Jul 29, 2024 at 3:02 PM Zhang Zepeng @.***> wrote:
Okay, thank you, now the code runs successfully! But I'm a bit confused now, how can I apply this code to my own dataset? For example, what should I do to input an image and its corresponding Gaussian point cloud file, and then have the code output the camera pose of the given image on the corresponding Gaussian point cloud?
— Reply to this email directly, view it on GitHub https://github.com/mbortolon97/6dgs/issues/3#issuecomment-2256033724, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGCAED6XKERK3CDXU23H5ETZOZDQTAVCNFSM6AAAAABLT3DXCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJWGAZTGNZSGQ . You are receiving this because you commented.Message ID: @.***>
Thank you very much for your response. I have carefully read through your code, and I see that the function test_pose_estimation
requires the input of the camera information cameras_info
for the photos to be predicted. It then calculates the w2c
transformation matrix for each photo. However, I noticed that the composition of the w2c
transformation matrix requires the rotation matrix R and translation vector T of the photo, which are derived from the camera extrinsics of that photo. These extrinsics, in turn, are obtained from colmap
for each photo used in the 3D reconstruction. As I am just starting out in computer vision, I am somewhat unclear on how, if I input a photo from a new viewpoint that did not participate in the colmap
3D reconstruction and thus lacks camera extrinsics, I should obtain the R and T matrices for this photo in order to estimate the camera pose for the new viewpoint. Thank you once again for your response!
The R and T need to be provided only if you want to evaluate how much the estimated pose differs from the GT. I did not test this code, but this can be a good sample of how to use test_pose_estimation without any R / T from GT:
import math
import numpy as np
from pose_estimation.file_utils import get_checkpoint_arguments
from pose_estimation.identification_module import IdentificationModule
from pose_estimation.test import test_pose_estimation
from scene.colmap_utils import Image
from scene.scene_structure import CameraInfo
import torch
from pretrain_eval_attention import explore_model, load_model
import os.path as op
from utils.graphics_utils import focal2fov
import argparse
@torch.no_grad()
def main(exp_dir_filepath, image_path, device='cuda', white_background=False):
checkpoint_args = get_checkpoint_arguments(exp_dir_filepath)
checkpoint_filepath = op.join(exp_dir_filepath, "point_cloud", "iteration_30000", "id_module.th")
gs_model = load_model(
checkpoint_filepath, device, sh_degrees=checkpoint_args.sh_degree
)
backbone_type = "dino"
id_module = (
IdentificationModule(backbone_type=backbone_type)
.eval()
.to(device, non_blocking=True)
)
start_iterations = 0
id_module_ckpt_path = op.join(exp_dir_filepath, "id_module.th")
if op.exists(id_module_ckpt_path):
print("Checkpoint already exist, skip training phase")
ckpt_dict = torch.load(id_module_ckpt_path, map_location=device)
id_module.load_state_dict(ckpt_dict["model_state_dict"])
start_iterations = ckpt_dict["epoch"]
test_cameras = []
image = Image.open(image_path)
im_data = np.array(image.convert("RGBA"))
bg = np.array([1, 1, 1]) if white_background else np.array([0, 0, 0])
norm_data = im_data / 255.0
arr = norm_data[:, :, :3] * norm_data[:, :, 3:4] + bg * (1 - norm_data[:, :, 3:4])
image = Image.fromarray(np.array(arr * 255.0, dtype=np.byte), "RGB")
test_cameras.append(
CameraInfo(
uid=0,
R=np.eye(3),
T=np.zeros((3,), dtype=np.float32),
FovY=math.pi/2,
FovX=math.pi/2,
image=image,
image_path=image_path,
image_name=op.splitext(op.basename(image_path))[0],
width=image.size[0],
height=image.size[1],
)
)
rays_ori, rays_dirs, rays_rgb = explore_model(gs_model)
print("Loading complete starting the test...")
(
results,
_,
_,
_,
_,
) = test_pose_estimation(
test_cameras,
id_module,
rays_ori,
rays_dirs,
rays_rgb,
torch.tensor([0., 0., 0.], device=device, dtype=torch.float32),
sequence_id='',
category_id='',
loss_fn=None,
)
print(results)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Pose Estimation CLI')
parser.add_argument('--exp_dir_filepath', type=str, help='Experiment directory filepath')
parser.add_argument('--image_path', type=str, help='Path to the image')
parser.add_argument('--device', type=str, default='cuda', help='Device to run the code on')
parser.add_argument('--white_background', action='store_true', help='Use white background')
args = parser.parse_args()
main(args.exp_dir_filepath, args.image_path, args.device, args.white_background)
I am truly grateful, I made a mistake. The R and T matrices are what we need to calculate, not what needs to be input beforehand. I got confused before, but now I understand.
Thank you very much for open-sourcing this project, but I have encountered an issue now. I would like to ask, what directory does the "--exp_path" parameter refer to? Currently, I can train Gaussian Splatting normally. However, when I try to execute the pose estimation according to the
python3 pretrain_eval_attention.py --exp_path ../pose-splatting/output/ --out_path results.json --data_type tankstemple
line of your markdown, it prompts me with the errorFileNotFoundError: [Errno 2] No such file or directory: '../pose-splatting/output/
. So, I modified the path of this parameter to the output directory where my Gaussian Splatting training is completed, which includes the point cloud files for 7000 and 30000 steps. After that, when I execute the above code, it can output the 'results' file normally, but the file is empty. Therefore, I would like to ask, what directory does this "--exp_path" parameter refer to?