ingra14m / Deformable-3D-Gaussians

[CVPR 2024] Official implementation of "Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction"
https://ingra14m.github.io/Deformable-Gaussians/
MIT License
922 stars 53 forks source link

Change another dataset #76

Open Wang-Wenqing opened 1 month ago

Wang-Wenqing commented 1 month ago

Thanks for your great work! I want to use another dataset, and try to get camera_info like this

def readNeumanCameras(path):

    cameras = read_cameras(f'{path}/sparse/cameras.txt')
    images_meta = read_images_meta(f'{path}/sparse/images.txt', f'{path}/images')

    keys = []
    frames = []
    for k, v in images_meta.items():
        keys.append(k)
        frames.append(os.path.basename(v.image_path))
    keys = [x for _, x in sorted(zip(frames, keys))]
    keys = sorted(keys, key=int)

    all_time = keys
    max_time = max(all_time)
    all_time = [i / max_time for i in all_time]

    train_num = len(frames)

    cam_infos = []

    for i, key in enumerate(keys):
        cur_cam_id = images_meta[key].camera_id
        cur_cam = cameras[cur_cam_id]
        cur_camera_pose = images_meta[key].camera_pose
        image_path = images_meta[key].image_path 
        cap = RGBPinholeCapture(image_path, cur_cam, cur_camera_pose)

        cap.frame_id = {'frame_id': i, 'total_frames': len(images_meta)}
        idx = i
        image = cap.image
        image = Image.fromarray((image).astype(np.uint8))

        image_name = cap.image_path.split('/')[-1]
        width = cap.shape[1]
        height = cap.shape[0]
        FovY = float(2 * np.arctan(cap.shape[0] / (2 * cap.intrinsic_matrix[1, 1]))) # 0.6565035439079898
        FovX = float(2 * np.arctan(cap.shape[1] / (2 * cap.intrinsic_matrix[0, 0]))) # 1.0895537198941696
        R = cap.cam_pose.rotation_matrix[:3, :3] # 4x4 
        T = cap.cam_pose.translation_vector
        fid = all_time[i]

        cam_info = CameraInfo(uid=idx, R=R, T=T, FovY=FovY, FovX=FovX, image=image,
                              image_path=image_path, image_name=image_name, width=width, height=height,
                              fid=fid)
        cam_infos.append(cam_info)

    sys.stdout.write('\n')
    return cam_infos, train_num

and after trained, the rendered training image look like this,

00000 00019

and the testing image look like this,

00007

00004

It will be very helpful if you can share your advice about what will cause this problem, should be the camera information not right?

ingra14m commented 1 month ago

Hi, thanks for your interest.

Looks that the dataset used in your setting is not strict COLMAP format. Ideally, Pinhole camera model would not lead to such a degree of blurry.

So, I suggest that: