nerfstudio-project / nerfstudio

A collaboration friendly studio for NeRFs
https://docs.nerf.studio
Apache License 2.0
9.28k stars 1.25k forks source link

Render only a specific frame #2811

Open briancantwe opened 7 months ago

briancantwe commented 7 months ago

I looked through the code a bit, but didn't seem to find what I was looking for. Is there a way to render only a specific frame, or a specific frame range of a camera move using ns-render?

Thanks so much!

kerrj commented 7 months ago

There currently isn't great support for single image rendering, you can either

briancantwe commented 7 months ago

Got it. Yeah, I'm currenting rendering the entire range and then picking the frames I need. I guess it's a feature request. Thanks for considering it!

On Tue, Jan 23, 2024 at 4:07 PM Justin Kerr @.***> wrote:

There currently isn't great support for single image rendering, you can either

  • save the viewer canvas as a PNG (in viser, click the control knobs icon->export canvas)
  • run ns-render images to save output frames in a folder as pngs instead of an .mp4, then choose the one you want

— Reply to this email directly, view it on GitHub https://github.com/nerfstudio-project/nerfstudio/issues/2811#issuecomment-1907123069, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDMD46KKWHVKN64BW7B5CLLYQBGDVAVCNFSM6AAAAABCHKQ2DWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBXGEZDGMBWHE . You are receiving this because you authored the thread.Message ID: @.***>

paolovic commented 4 months ago

There currently isn't great support for single image rendering, you can either

  • save the viewer canvas as a PNG (in viser, click the control knobs icon->export canvas)
  • run ns-render images to save output frames in a folder as pngs instead of an .mp4, then choose the one you want

can you give me some hint, please, how I can make this possible for splatfacto? I simply cannot find the function that renders the scene since 2 days and I keep on digging through the codebase... how is the view that I see in the viewer created? There must underly some function that does something like that, must'nt it? And I want to give this function my pose (lcoation+orientation) and need a single image in return...

brentyi commented 4 months ago

For rendering the scene, maybe you can work backward from here? https://github.com/nerfstudio-project/nerfstudio/blob/dd811f58bce4dd56a75147c63218eb190fc8143c/nerfstudio/scripts/render.py#L195-L198

paolovic commented 4 months ago

For rendering the scene, maybe you can work backward from here?

https://github.com/nerfstudio-project/nerfstudio/blob/dd811f58bce4dd56a75147c63218eb190fc8143c/nerfstudio/scripts/render.py#L195-L198

I did, respectively, I tried. In the meantime, I thought that render.py is not the right place to investigate since render follows a trajectory between my training images and just renders this in a video/images. That's why I moved to viewer.py since the viewer allows me to move to arbitrary places in my 3D model and renders the view correspondingly. But I might miss the forest for the trees and investigate get_outputs_for_camera further. Thank you very much!

I think I found the most important part, I'll update this thread later! Thank you!

paolovic commented 4 months ago

So, I extended my nerfstudio render.py by the following feature

It allows me to specify an eye and a target to create an initial view, but also a start and an end angle to define an arc to get frames from. Furthermore, it's possible to define the granularity, i.e., the intermediate steps to determine how many pictures you'd like.

A previously trained model is required. I've tested this solution with a splatfacto model, so it might be tailored towards splatfacto and might not work out-of-the-box for other models. Callable like the following: ns-render pose-view --output-format images --load-config outputs/room10/splatfacto/2024-04-20_203234/config.yml --load-dataparser-transforms outputs/room10/splatfacto/2024-04-20_203234/dataparser_transforms.json --eye " 0.1 0.9 -0.2" --target "0 0.15 0"

def calculate_camera_to_world_matrix(eye, target, up):
    forward = (target - eye) / torch.norm(target - eye, dim=1, keepdim=True)
    right = torch.cross(up.expand_as(forward), forward)
    right /= torch.norm(right, dim=1, keepdim=True)
    up = torch.cross(forward, right)
    # Create rotation matrices
    rotation_matrices = torch.stack([right, up, -forward], dim=-1)
    # Flip the z and y axes to align with gsplat conventions
    r_edit = torch.diag(torch.tensor([-1., -1., -1.], device=rotation_matrices.device))
    r_edit = r_edit.unsqueeze(0).repeat(rotation_matrices.size(0), 1, 1)
    rotation_matrices = torch.einsum('bij,bjk->bik', rotation_matrices, r_edit)

    # Create translation matrices
    translation_matrices = torch.eye(4).repeat(rotation_matrices.size(0), 1, 1)
    translation_matrices[:, :3, 3] = -eye

    # Combine rotation and translation matrices
    camera_to_world_matrices = torch.eye(4).repeat(rotation_matrices.size(0), 1, 1)  # Initialize with identity matrices
    camera_to_world_matrices[:, :3, :3] = rotation_matrices
    camera_to_world_matrices = torch.einsum('bij,bjk->bik', camera_to_world_matrices, translation_matrices)

    return camera_to_world_matrices

# Function to create a rotation matrix around the y-axis
def rotation_matrix_y(angles, device):
    cos_angles = torch.cos(angles)
    sin_angles = torch.sin(angles)
    rotation_matrices = torch.zeros((angles.shape[0], 3, 3), device=device)
    rotation_matrices[:, 0, 0] = cos_angles
    rotation_matrices[:, 0, 2] = sin_angles
    rotation_matrices[:, 1, 1] = 1
    rotation_matrices[:, 2, 0] = -sin_angles
    rotation_matrices[:, 2, 2] = cos_angles
    return rotation_matrices

def get_pose_view_cameras(transformation_matrix, scale_factor, eye, target, start_angle, end_angle, intermediate_steps, device, camera):
    transformation_matrix = torch.tensor(transformation_matrix, device=device)
    transformation_matrix = torch.cat((transformation_matrix, torch.tensor([[0, 0, 0, 1]], device=device)), dim=0)   # Homogeneous coordinate
    # Apply scale to transformation matrix
    scaled_transformation_matrix = transformation_matrix * scale_factor

    eye = torch.tensor(eye, device=device)
    target = torch.tensor(target, device=device)

    # Generate views by rotating around the y-axis in the world coordinate system and transform these rotated targets into the NeRF coordinate system afterwards
    start_angle = np.deg2rad(start_angle)
    end_angle = np.deg2rad(end_angle)
    angles = torch.linspace(start_angle, end_angle, intermediate_steps)
    # Create rotation matrices for all angles in parallel
    rotation_matrices = rotation_matrix_y(angles, device)
    # Rotate the target vector around the eye position
    rotated_targets = torch.einsum('ijk,k->ij', rotation_matrices, target - eye) + eye

    # Transform the eye into NeRF coordinate system
    eye_nerf = torch.cat((eye, torch.tensor([1.0], device=device))) # Homogeneous coordinate
    eye_nerf = scaled_transformation_matrix @ eye_nerf
    eye_nerf = eye_nerf[:-1] # Remove homogeneous component

    # Transform the rotated_targets into the NeRF coordinate system
    rotated_targets_nerf = torch.cat((rotated_targets, torch.ones_like(rotated_targets[:, :1], device=device)), dim=1) # Homogeneous coordinate
    # scaled_transformation_matrix.unsqueeze(0) adds a new dimension at the beginning of scaled_transformation_matrix, making it a 1x4x4 tensor.
    # .expand(rotated_targets_nerf.size(0), -1, -1) expands the dimensions of scaled_transformation_matrix along the first axis to match the number of rows in rotated_targets_nerf, resulting in a intermediate_stepsx4x4 tensor.
    expanded_scaled_transformation_matrix = scaled_transformation_matrix.unsqueeze(0).expand(rotated_targets_nerf.size(0), -1, -1)
    # rotated_targets_nerf.unsqueeze(-1) adds a new singleton dimension at the end of rotated_targets_nerf, making it a intermediate_stepsx4x1 tensor, which is required for matrix multiplication.
    rotated_targets_nerf = expanded_scaled_transformation_matrix @ rotated_targets_nerf.unsqueeze(-1)
    # .squeeze(-1) removes the last singleton dimension from the result to get the final shape.
    rotated_targets_nerf = rotated_targets_nerf.squeeze(-1)[:,:-1]

    up = torch.tensor([0, 1., 0], device=device)  # Up vector
    up = torch.cat(up, torch.tensor([1.0], device=device)))
    up_nerf = scaled_transformation_matrix @ up
    up_nerf = up_nerf[:-1] # Remove homogeneous component

    # Calculate camera-to-world matrices for all views
    camera_to_world_matrices = calculate_camera_to_world_matrix(eye_nerf, rotated_targets_nerf, up_nerf)

    # Set intrinsic parameters
    fx = camera.fx.expand(intermediate_steps, -1)
    fy = camera.fy.expand(intermediate_steps, -1)
    cx = camera.cx.expand(intermediate_steps, -1)
    cy = camera.cy.expand(intermediate_steps, -1)

    width = camera.width.expand(intermediate_steps, -1)
    height = camera.height.expand(intermediate_steps, -1)

    distortion_params = camera.distortion_params.expand(intermediate_steps, -1) if camera.distortion_params is not None else None
    camera_type = camera.camera_type.expand(intermediate_steps, -1)

    times = camera.times.expand(intermediate_steps, -1) if camera.times is not None else None
    metadata = camera.metadata if camera.metadata is not None else None

    cameras = Cameras(camera_to_worlds=camera_to_world_matrices,fx=fx,fy=fy,cx=cx,cy=cy,width=width,height=height,distortion_params=distortion_params,camera_type=camera_type,times=times,metadata=metadata)
    return cameras

@dataclass
class RenderPoseView(BaseRender):
    """Rotate around a pose and render the corresponding views."""

    pose_source: Literal["eval", "train"] = "eval"
    """Pose source to render."""
    interpolation_steps: int = 10
    """Number of interpolation steps between eval dataset cameras."""
    order_poses: bool = False
    """Whether to order camera poses by proximity."""
    frame_rate: int = 24
    """Frame rate of the output video."""
    output_format: Literal["images", "video"] = "video"
    """How to save output data."""

    eye: str = ""
    """Eye of the pose to render of shape "x y z" in the original data coordinate system."""
    target: str = "[0, eye[1], 0]" #if not set, make it look towards the origin but in parallel to the floor
    """Target of the pose to render of shape "x y z" in the original data coordinate system."""
    start_angle: float = 0.0
    """Start angle of the perspective relative to the origin."""
    end_angle: float = 345.0
    """End angle of the perspective relative to the origin."""
    intermediate_steps: int = 20
    """Number of intermediate steps of the arc that will be rendered."""
    load_dataparser_transforms: Path = Path()
    """Path to dataparser_transforms JSON file."""

    def main(self) -> None:
        """Main function."""
        _, pipeline, _, step = eval_setup(
            self.load_config,
            eval_num_rays_per_chunk=self.eval_num_rays_per_chunk,
            test_mode="test",
        )

        assert self.load_dataparser_transforms.is_file(), f"dataparser_transforms.json could not be found in {self.load_dataparser_transform}."
        with open(self.load_dataparser_transforms) as f:
            self.load_dataparser_transforms = json.load(f)
            assert "transform" in self.load_dataparser_transforms, f"Transformation matrix could not be found in {self.load_dataparser_transform}."
            assert "scale" in self.load_dataparser_transforms, f"Scale factor could not be found in {self.load_dataparser_transform}."
            self.transform = self.load_dataparser_transforms["transform"]
            assert len(self.transform) == 3, f"Transformation matrix must be of shape [3 4]."
            assert len(self.transform[0]) == 4, f"Transformation matrix must be of shape [3 4]."
            self.scale = self.load_dataparser_transforms["scale"]
            assert type(self.scale) is float, f"Scale factor must be a scalar."
        self.eye = [float(x) for x in self.eye.split()]
        assert len(self.eye)==3, "Eye must be of shape \"x y z\" in the original data coordinate system."
        self.target = [float(x) for x in self.target.split()] if self.target != "[0, eye[1], 0]" else [0, self.eye[1], 0]
        assert len(self.target)==3, "Target must be of shape \"x y z\" in the original data coordinate system."
        assert self.start_angle <= self.end_angle, "The start angle must be less than or equal to the end angle."
        assert self.start_angle > -360, "The start angle must not exceed -360 degrees."
        assert self.end_angle < 360, "The end angle must not exceed 360 degrees."

        if self.pose_source == "eval":
            assert pipeline.datamanager.eval_dataset is not None
            camera, _ = pipeline.datamanager.next_eval(step)
        else:
            assert pipeline.datamanager.train_dataset is not None
            camera, _ = pipeline.datamanager.next_train(step)

        pose_view_cameras = get_pose_view_cameras(self.transform, self.scale, self.eye, self.target, self.start_angle, self.end_angle, self.intermediate_steps, pipeline.device, camera)
        install_checks.check_ffmpeg_installed()

        seconds = self.intermediate_steps * len(pose_view_cameras) / self.frame_rate

        _render_trajectory_video(
            pipeline,
            pose_view_cameras,
            output_filename=self.output_path,
            rendered_output_names=self.rendered_output_names,
            rendered_resolution_scaling_factor=1.0 / self.downscale_factor,
            seconds=seconds,
            output_format=self.output_format,
            image_format=self.image_format,
            depth_near_plane=self.depth_near_plane,
            depth_far_plane=self.depth_far_plane,
            colormap_options=self.colormap_options,
            render_nearest_camera=self.render_nearest_camera,
            check_occlusions=self.check_occlusions,
        )

Commands = tyro.conf.FlagConversionOff[
    Union[
        Annotated[RenderCameraPath, tyro.conf.subcommand(name="camera-path")],
        Annotated[RenderInterpolated, tyro.conf.subcommand(name="interpolate")],
        Annotated[RenderPoseView, tyro.conf.subcommand(name="pose-view")],
        Annotated[SpiralRender, tyro.conf.subcommand(name="spiral")],
        Annotated[DatasetRender, tyro.conf.subcommand(name="dataset")],
    ]
]