nerfstudio-project / nerfstudio

A collaboration friendly studio for NeRFs
https://docs.nerf.studio
Apache License 2.0
9.52k stars 1.3k forks source link

Add Orthographic Rendering Support #2602

Open LeaFendd opened 12 months ago

LeaFendd commented 12 months ago

Is your feature request related to a problem? Please describe. At present, nerfstudio does not support orthographic renderring. However, generating orthographic images requires merely producing a set of parallel rays for rendering for NeRF. This capability is essential for various applications but is currently lacking in nerfstudio. I plan to submit a PR for this feature.

Describe the solution you'd like I propose to extend the existing Cameras class by adding an orthographic camera model to CameraType. This model would accept the same camera parameters as the perspective camera model already present in nerfstudio, allowing users to easily switch between perspective and orthographic modes.

akristoffersen commented 11 months ago

This sounds awesome, I think it would be a great addition-- if you plan on submitting a PR for this, I'd be happy to help.

Would we also want another export method to generate an ortho photo of the scene?

LeaFendd commented 11 months ago

This sounds awesome, I think it would be a great addition-- if you plan on submitting a PR for this, I'd be happy to help.

Would we also want another export method to generate an ortho photo of the scene?

Thank you for reply! I'd submitted a PR #2648 . You can generate an ortho photo with the ORTHOPHOTO CameraType like other CameraType, here is a render example with instant ngp. lego-ortho

But I don't how to integrate it into ns-viewer now, I'll try it later.

ArpegorPSGH commented 11 months ago

@LeaFendd I am trying to use your ortho camera with SDFStudio, but I get a matrix addition error due to a broadcasting problem at this line : coord_x = (coord_x + 0.5 - self.cx) / scale * self.fx

coord_x shape is [1000,1000], the size of one of my images, while self.cx is [91,1], the number of my images. I cannot understand how this operation can make any sense. To my understanding, we are supposed to get as a result a 2D matrix of size 91 (number of cameras) or 2048 (number of rays in my batch) in the first dimension, to then stack it to get to 3D and finally multiply it with c2w.

Could you please explain me the reasoning behind this operation and how I could make it work?

LeaFendd commented 11 months ago

@LeaFendd I am trying to use your ortho camera with SDFStudio, but I get a matrix addition error due to a broadcasting problem at this line : coord_x = (coord_x + 0.5 - self.cx) / scale * self.fx

coord_x shape is [1000,1000], the size of one of my images, while self.cx is [91,1], the number of my images. I cannot understand how this operation can make any sense. To my understanding, we are supposed to get as a result a 2D matrix of size 91 (number of cameras) or 2048 (number of rays in my batch) in the first dimension, to then stack it to get to 3D and finally multiply it with c2w.

Could you please explain me the reasoning behind this operation and how I could make it work?

Sorry, I didn't receive your message in time. This step is used to generate ray_origin. First, a meshgrid is created on the xoy plane, represented by homogeneous coordinates (x, y, z, h). Then, the c2w matrix is applied to transform the meshgrid to the actual camera position. Since the orthographic camera is only used for rendering and not considered for training, I only considered the scenario where cx and cy have a single value when using it. In my usage, cx and cy are both a single Float rather than a vector, which led to a bug in broadcasting. Thank you for your feedback, I will fix it as soon as I can.

ArpegorPSGH commented 11 months ago

You're welcome. I definitely think all cameras should be usable for both training and final render, otherwise it would become a bit confusing and go against the "plug'n play" interchangeability allowed by the modularity of the framework.

LeaFendd commented 11 months ago

You're welcome. I definitely think all cameras should be usable for both training and final render, otherwise it would become a bit confusing and go against the "plug'n play" interchangeability allowed by the modularity of the framework.

yeah, I'm refactoring a more general version, give me some time

ArpegorPSGH commented 10 months ago

How is the refactoring going? I'm working on other parts of my model, but I'll probably be done with them by the end of next week. Do you think your new version will be available by then?

LeaFendd commented 10 months ago

How is the refactoring going? I'm working on other parts of my model, but I'll probably be done with them by the end of next week. Do you think your new version will be available by then?

I just committed the new version in my PR #2648

hot-dog commented 9 months ago

@LeaFendd May I ask is there any plans to support for orthographic rendering of gaussian splatting? Or could you give some suggestions on how to implement this? I can help implementing it.

LeaFendd commented 9 months ago

@LeaFendd May I ask is there any plans to support for orthographic rendering of gaussian splatting? Or could you give some suggestions on how to implement this? I can help implementing it.

Sorry, I don't have such plan recently. You just need to construct an orthographic projection matrix to replace the perspective projection matrix. I think you should modify here: https://github.com/nerfstudio-project/nerfstudio/blob/main/nerfstudio/models/splatfacto.py#L699 To implement an orthographic_projection_matrix() function, you can refer to https://www.songho.ca/opengl/gl_projectionmatrix.html Good luck!!!!!

hot-dog commented 9 months ago

@LeaFendd May I ask is there any plans to support for orthographic rendering of gaussian splatting? Or could you give some suggestions on how to implement this? I can help implementing it.

Sorry, I don't have such plan recently. You just need to construct an orthographic projection matrix to replace the perspective projection matrix. I think you should modify here: https://github.com/nerfstudio-project/nerfstudio/blob/main/nerfstudio/models/splatfacto.py#L699 To implement an orthographic_projection_matrix() function, you can refer to https://www.songho.ca/opengl/gl_projectionmatrix.html Good luck!!!!!

@LeaFendd Thank you for your reply. Following your suggestion, i construct orghographic projection matrix as follows:

def projection_matrix(znear, zfar, fovX, fovY):
    tanHalfFovY = math.tan((fovY / 2))
    tanHalfFovX = math.tan((fovX / 2))

    top = tanHalfFovY * zfar
    bottom = -top
    right = tanHalfFovX * zfar
    left = -right

    P = torch.zeros(4, 4)

    z_sign = 1.0

    P[0, 0] = 2.0 / (right - left)
    P[0, 3] = - (right + left) / (right - left)
    P[1, 1] = 2.0 / (top - bottom)
    P[1, 3] = - (top + bottom) / (top - bottom)
    P[2, 2] = -2.0 / (zfar - znear)
    P[2, 3] = - (zfar + znear)/(zfar - znear)
    P[3, 3] = z_sign

    return P

And the rendered result is as follows: image The result is orthographic since building's facade is invisible,but it is foggy, i think it is due to the lack of depth info, am i right and any suggestion to solve this? Than you!

LeaFendd commented 9 months ago

@LeaFendd May I ask is there any plans to support for orthographic rendering of gaussian splatting? Or could you give some suggestions on how to implement this? I can help implementing it.

Sorry, I don't have such plan recently. You just need to construct an orthographic projection matrix to replace the perspective projection matrix. I think you should modify here: https://github.com/nerfstudio-project/nerfstudio/blob/main/nerfstudio/models/splatfacto.py#L699 To implement an orthographic_projection_matrix() function, you can refer to https://www.songho.ca/opengl/gl_projectionmatrix.html Good luck!!!!!

@LeaFendd Thank you for your reply. Following your suggestion, i construct orghographic projection matrix as follows:

def projection_matrix(znear, zfar, fovX, fovY):
    tanHalfFovY = math.tan((fovY / 2))
    tanHalfFovX = math.tan((fovX / 2))

    top = tanHalfFovY * zfar
    bottom = -top
    right = tanHalfFovX * zfar
    left = -right

    P = torch.zeros(4, 4)

    z_sign = 1.0

    P[0, 0] = 2.0 / (right - left)
    P[0, 3] = - (right + left) / (right - left)
    P[1, 1] = 2.0 / (top - bottom)
    P[1, 3] = - (top + bottom) / (top - bottom)
    P[2, 2] = -2.0 / (zfar - znear)
    P[2, 3] = - (zfar + znear)/(zfar - znear)
    P[3, 3] = z_sign

    return P

And the rendered result is as follows: image The result is orthographic since building's facade is invisible,but it is foggy, i think it is due to the lack of depth info, am i right and any suggestion to solve this? Than you!

Setting bigger z_near may help, i think. BTW, Happy Chinese New Year! :)

hot-dog commented 8 months ago

@LeaFendd May I ask is there any plans to support for orthographic rendering of gaussian splatting? Or could you give some suggestions on how to implement this? I can help implementing it.

Sorry, I don't have such plan recently. You just need to construct an orthographic projection matrix to replace the perspective projection matrix. I think you should modify here: https://github.com/nerfstudio-project/nerfstudio/blob/main/nerfstudio/models/splatfacto.py#L699 To implement an orthographic_projection_matrix() function, you can refer to https://www.songho.ca/opengl/gl_projectionmatrix.html Good luck!!!!!

@LeaFendd Thank you for your reply. Following your suggestion, i construct orghographic projection matrix as follows:

def projection_matrix(znear, zfar, fovX, fovY):
    tanHalfFovY = math.tan((fovY / 2))
    tanHalfFovX = math.tan((fovX / 2))

    top = tanHalfFovY * zfar
    bottom = -top
    right = tanHalfFovX * zfar
    left = -right

    P = torch.zeros(4, 4)

    z_sign = 1.0

    P[0, 0] = 2.0 / (right - left)
    P[0, 3] = - (right + left) / (right - left)
    P[1, 1] = 2.0 / (top - bottom)
    P[1, 3] = - (top + bottom) / (top - bottom)
    P[2, 2] = -2.0 / (zfar - znear)
    P[2, 3] = - (zfar + znear)/(zfar - znear)
    P[3, 3] = z_sign

    return P

And the rendered result is as follows: image The result is orthographic since building's facade is invisible,but it is foggy, i think it is due to the lack of depth info, am i right and any suggestion to solve this? Than you!

Setting bigger z_near may help, i think. BTW, Happy Chinese New Year! :)

HAHA, happy chinese new year! I have tried setting bigger z_near(eg z_near=60), but the result is the same, z_near and z_far seems not being used in the process of 3dgs.

Golbstein commented 1 month ago

Any update on this matter? How can I use ns-render to render an orthophoto?

Golbstein commented 1 month ago

@LeaFendd May I ask is there any plans to support for orthographic rendering of gaussian splatting? Or could you give some suggestions on how to implement this? I can help implementing it.

Sorry, I don't have such plan recently. You just need to construct an orthographic projection matrix to replace the perspective projection matrix. I think you should modify here: https://github.com/nerfstudio-project/nerfstudio/blob/main/nerfstudio/models/splatfacto.py#L699 To implement an orthographic_projection_matrix() function, you can refer to https://www.songho.ca/opengl/gl_projectionmatrix.html Good luck!!!!!

@LeaFendd Thank you for your reply. Following your suggestion, i construct orghographic projection matrix as follows:

def projection_matrix(znear, zfar, fovX, fovY):
    tanHalfFovY = math.tan((fovY / 2))
    tanHalfFovX = math.tan((fovX / 2))

    top = tanHalfFovY * zfar
    bottom = -top
    right = tanHalfFovX * zfar
    left = -right

    P = torch.zeros(4, 4)

    z_sign = 1.0

    P[0, 0] = 2.0 / (right - left)
    P[0, 3] = - (right + left) / (right - left)
    P[1, 1] = 2.0 / (top - bottom)
    P[1, 3] = - (top + bottom) / (top - bottom)
    P[2, 2] = -2.0 / (zfar - znear)
    P[2, 3] = - (zfar + znear)/(zfar - znear)
    P[3, 3] = z_sign

    return P

And the rendered result is as follows: image The result is orthographic since building's facade is invisible,but it is foggy, i think it is due to the lack of depth info, am i right and any suggestion to solve this? Than you!

How did you get this render? which part of the code you've changed?

LeaFendd commented 1 month ago

@LeaFendd May I ask is there any plans to support for orthographic rendering of gaussian splatting? Or could you give some suggestions on how to implement this? I can help implementing it.

Sorry, I don't have such plan recently. You just need to construct an orthographic projection matrix to replace the perspective projection matrix. I think you should modify here: https://github.com/nerfstudio-project/nerfstudio/blob/main/nerfstudio/models/splatfacto.py#L699 To implement an orthographic_projection_matrix() function, you can refer to https://www.songho.ca/opengl/gl_projectionmatrix.html Good luck!!!!!

@LeaFendd Thank you for your reply. Following your suggestion, i construct orghographic projection matrix as follows:

def projection_matrix(znear, zfar, fovX, fovY):
    tanHalfFovY = math.tan((fovY / 2))
    tanHalfFovX = math.tan((fovX / 2))

    top = tanHalfFovY * zfar
    bottom = -top
    right = tanHalfFovX * zfar
    left = -right

    P = torch.zeros(4, 4)

    z_sign = 1.0

    P[0, 0] = 2.0 / (right - left)
    P[0, 3] = - (right + left) / (right - left)
    P[1, 1] = 2.0 / (top - bottom)
    P[1, 3] = - (top + bottom) / (top - bottom)
    P[2, 2] = -2.0 / (zfar - znear)
    P[2, 3] = - (zfar + znear)/(zfar - znear)
    P[3, 3] = z_sign

    return P

And the rendered result is as follows: image The result is orthographic since building's facade is invisible,but it is foggy, i think it is due to the lack of depth info, am i right and any suggestion to solve this? Than you!

How did you get this render? which part of the code you've changed?

@LeaFendd May I ask is there any plans to support for orthographic rendering of gaussian splatting? Or could you give some suggestions on how to implement this? I can help implementing it.

Sorry, I don't have such plan recently. You just need to construct an orthographic projection matrix to replace the perspective projection matrix. I think you should modify here: https://github.com/nerfstudio-project/nerfstudio/blob/main/nerfstudio/models/splatfacto.py#L699 To implement an orthographic_projection_matrix() function, you can refer to https://www.songho.ca/opengl/gl_projectionmatrix.html Good luck!!!!!

@LeaFendd Thank you for your reply. Following your suggestion, i construct orghographic projection matrix as follows:

def projection_matrix(znear, zfar, fovX, fovY):
    tanHalfFovY = math.tan((fovY / 2))
    tanHalfFovX = math.tan((fovX / 2))

    top = tanHalfFovY * zfar
    bottom = -top
    right = tanHalfFovX * zfar
    left = -right

    P = torch.zeros(4, 4)

    z_sign = 1.0

    P[0, 0] = 2.0 / (right - left)
    P[0, 3] = - (right + left) / (right - left)
    P[1, 1] = 2.0 / (top - bottom)
    P[1, 3] = - (top + bottom) / (top - bottom)
    P[2, 2] = -2.0 / (zfar - znear)
    P[2, 3] = - (zfar + znear)/(zfar - znear)
    P[3, 3] = z_sign

    return P

And the rendered result is as follows: image The result is orthographic since building's facade is invisible,but it is foggy, i think it is due to the lack of depth info, am i right and any suggestion to solve this? Than you!

How did you get this render? which part of the code you've changed?

Since several major update have been released (especially gsplat v1.0) after this issue, the method mentioned above for rendering orthophotos in gsplat is no longer available.

I think it's time to initiate support for orthophoto rendering in gsplat, but I'm busy job hunting for now. If all goes well, I expect to update in 3 weeks.

You can refer to the method mentioned above, by modifying the projection matrix, which requires you to modify the source in gsplat, not in nerfstudio.