nerfstudio-project / nerfstudio

A collaboration friendly studio for NeRFs
https://docs.nerf.studio
Apache License 2.0
8.87k stars 1.18k forks source link

Question Regarding Image Downscaling Method #3263

Open wzy-99 opened 2 days ago

wzy-99 commented 2 days ago

https://github.com/nerfstudio-project/nerfstudio/blob/9b3cbc79bf239eb3c69e7c288632aab02c4f0bb1/nerfstudio/models/splatfacto.py#L83

Why was the following method chosen for downscaling images instead of directly using F.resize?

def resize_image(image: torch.Tensor, d: int):
    """
    Downscale images using the same 'area' method in opencv

    :param image shape [H, W, C]
    :param d downscale factor (must be 2, 4, 8, etc.)

    return downscaled image in shape [H//d, W//d, C]
    """
    import torch.nn.functional as tf

    image = image.to(torch.float32)
    weight = (1.0 / (d * d)) * torch.ones((1, 1, d, d), dtype=torch.float32, device=image.device)
    return tf.conv2d(image.permute(2, 0, 1)[:, None, ...], weight, stride=d).squeeze(1).permute(1, 2, 0)

My concern is that this method may lead to misaligned coordinates. For instance, if we input an image of size 19x19 and downscale it by a factor of 4, the last 3 pixels would be left empty, whereas ideally, these 3 pixels should be evenly distributed in one row.

https://github.com/nerfstudio-project/nerfstudio/blob/9b3cbc79bf239eb3c69e7c288632aab02c4f0bb1/nerfstudio/data/dataparsers/colmap_dataparser.py#L460

Additionally, I noticed in another part of the code, linear interpolation (FFMPEG default) is used for image downsampling. Therefore, for code consistency, I believe the same interpolation method should be used during dataset preprocessing and training phase downsampling.

ginazhouhuiwu commented 2 days ago

Actually I think a while back we wanted to upgrade this part to support downscaling better (not just by a constant), so thanks for bringing it up! I would be happy to look into this and you are more than welcome to make a PR if you want to address this feature immediately.