graphdeco-inria / gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Other
13.87k stars 1.79k forks source link

How to render orthographic projection views? #578

Open haksorus opened 9 months ago

haksorus commented 9 months ago

Hello! I would like to render some ortho-views of my scene using Orthographic projection (https://en.wikipedia.org/wiki/Orthographic_projection).

For this purpose, I changed the getProjectionMatrix() function according to the Orthographic projection matrix:

image
def getProjectionMatrix(znear, zfar, fovX, fovY):
    tanHalfFovY = math.tan((fovY / 2))
    tanHalfFovX = math.tan((fovX / 2))

    top = tanHalfFovY * znear
    bottom = -top
    right = tanHalfFovX * znear
    left = -right

    P = torch.zeros(4, 4)

    z_sign = 1.0

    P[0, 0] = 2.0 / (right - left)
    P[0, 3] = - (right + left) / (right - left)
    P[1, 1] = 2.0 / (top - bottom)
    P[1, 3] = - (top + bottom) / (top - bottom)
    P[2, 2] = -2.0 / (zfar - znear)
    P[2, 3] = - (zfar + znear)/(zfar - znear)
    P[3, 3] = z_sign

    return P

But as a result, I always get a black picture What could be done to solve this problem?

Dehaoq commented 9 months ago

In the code, they use the perspective projection matrix, which can make the size vary inversely with distance. However, orthographic projection cannot do that. In their cuda code, they get the depth information from the perspective projection matrix to realize the rendering process. Orthographic projection cannot provide depth information. Therefore, if you want to make orthographic projection work, you have to change the cuda code part.

hot-dog commented 8 months ago

@haksorus hello, i encountered the same problem, i want to render a top-down orthographic projection view. Did you figure out the issue?

hot-dog commented 7 months ago

In the code, they use the perspective projection matrix, which can make the size vary inversely with distance. However, orthographic projection cannot do that. In their cuda code, they get the depth information from the perspective projection matrix to realize the rendering process. Orthographic projection cannot provide depth information. Therefore, if you want to make orthographic projection work, you have to change the cuda code part.

@Dehaoq I adopt haksorus's Orthographic projection matrix but made little modification when calcuating top, bottom, right, left value, i use zfar instead of znear. By doing so, i get some reasonable results as follows, the result is orthographic since building's facade is invisible,but the result is foggy, i think it is due to the lack of depth info. Could you please give some suggestions for changing the cuda code part to incoprate depth info? Than you! image

GANWANSHUI commented 7 months ago

@hot-dog Hi, Have you achieved the Orthographic projection with GS? Looking forward to your update. Thank you!

hot-dog commented 7 months ago

@hot-dog Hi, Have you achieved the Orthographic projection with GS? Looking forward to your update. Thank you!

sorry, i have not solved the problem mentioned above, i am still trying.

GANWANSHUI commented 7 months ago

@hot-dog Hi, Have you achieved the Orthographic projection with GS? Looking forward to your update. Thank you!

sorry, i have not solved the problem mentioned above, i am still trying.

looking forward to the update if any, thanks a lot in advance!

hot-dog commented 7 months ago

@hot-dog Hi, Have you achieved the Orthographic projection with GS? Looking forward to your update. Thank you!

sorry, i have not solved the problem mentioned above, i am still trying.

looking forward to the update if any, thanks a lot in advance!

I have achieved clear orthographic rendering. The original code use EWA splatting, which assumes perspective projection to project 3D gaussian to 2D gaussian. However, perspective projection causes distortion at non-central positions, leading to the loss of some good characteristics after projecting 3D gaussian to 2D, so in diff-gaussian-rasterization/cuda_rasterizer/forward.cu, the computeCov2D function use some kind of approximation method to avoid such loss, i.e the J matrix, according to my understanding, orthographic projection does not cause distortion, so i replace the J matrix with a diagonal matrix as follows and it works! image

I am not very clear on the underlying principles yet, since my math foundation is weak, i dont konw how to derive the exact J matrix. If you can help, it will be very appreciated:)

cv-lab-x commented 6 months ago

Orthographic

hi, did your train with Orthographic views or just test rendering with Orthographic views ? @hot-dog Looking forward to your reply, thanks!

hot-dog commented 6 months ago

Orthographic

hi, did your train with Orthographic views or just test rendering with Orthographic views ? @hot-dog Looking forward to your reply, thanks!

I am testing rendering with orthographic projection, the training must be perspective projection since the training images are taken with pinhole camera, if training with orthographic projection, it will not converge.

cv-lab-x commented 6 months ago

Orthographic

hi, did your train with Orthographic views or just test rendering with Orthographic views ? @hot-dog Looking forward to your reply, thanks!

I am testing rendering with orthographic projection, the training must be perspective projection since the training images are taken with pinhole camera, if training with orthographic projection, it will not converge.

thanks, what's the meaning of 150 in the J matrix you modified?

hot-dog commented 6 months ago

Orthographic

hi, did your train with Orthographic views or just test rendering with Orthographic views ? @hot-dog Looking forward to your reply, thanks!

I am testing rendering with orthographic projection, the training must be perspective projection since the training images are taken with pinhole camera, if training with orthographic projection, it will not converge.

thanks, what's the meaning of 150 in the J matrix you modified?

I tried several values and get best result when it is 150. The value is experimentally and should vary with different scene and camera pose. As i said in the early post, my math is poor, i dont kown how to derive the exact J matrix, it should be something related to current processing gaussian point's parameters. If you could help the derivation of J matrix or give some suggestions, it would be very appreciate!:)

boqian-li commented 5 months ago

Hi, any update now? I mean if there's some way to get orthographic projection test views without a experimental value? @hot-dog

boqian-li commented 5 months ago

Hi, any update now?

lieoojinyi commented 5 months ago

Spline can render with orthogonal camera. But no ideas about how to do it https://www.reddit.com/r/Spline3D/comments/184e9df/spline_tip_use_the_perspective_camera_when/

gwen233666 commented 3 months ago

achieved clear orthographic rendering.

Hi buddy,have you achieved orthographic rendering?

wangyicxy commented 3 months ago

I have successfully achieved clear orthogonal projection by modifying the projection matrix and j matrix !It can be seen that the proportion of buildings in the picture from top to bottom is consistent, not like the perspective projection, which is close to big and far from small. 0037 (4)

gwen233666 commented 3 months ago

I have successfully achieved clear orthogonal projection by modifying the projection matrix and j matrix !It can be seen that the proportion of buildings in the picture from top to bottom is consistent, not like the perspective projection, which is close to big and far from small. 0037 (4)

I feel that the projection quality is still somewhat degraded. Could you please post a group of pictures for comparison? It would be sweeter to share this dataset! Thank you.

wangyicxy commented 3 months ago

I have successfully achieved clear orthogonal projection by modifying the projection matrix and j matrix !It can be seen that the proportion of buildings in the picture from top to bottom is consistent, not like the perspective projection, which is close to big and far from small. 0037 (4)

I feel that the projection quality is still somewhat degraded. Could you please post a group of pictures for comparison? It would be sweeter to share this dataset! Thank you.

Okay, this is the result of perspective projection, it seems to have been magnified, and I'm not sure if it's a normal phenomenon. 0037 (1)

YihangChen-ee commented 2 months ago

I have successfully achieved clear orthogonal projection by modifying the projection matrix and j matrix !It can be seen that the proportion of buildings in the picture from top to bottom is consistent, not like the perspective projection, which is close to big and far from small. 0037 (4)

I feel that the projection quality is still somewhat degraded. Could you please post a group of pictures for comparison? It would be sweeter to share this dataset! Thank you.

Okay, this is the result of perspective projection, it seems to have been magnified, and I'm not sure if it's a normal phenomenon. 0037 (1)

Hi, could you please share your modified projection matrix and j matrix?

Sapphire-356 commented 1 month ago

#

Orthographic

hi, did your train with Orthographic views or just test rendering with Orthographic views ? @hot-dog Looking forward to your reply, thanks!

I am testing rendering with orthographic projection, the training must be perspective projection since the training images are taken with pinhole camera, if training with orthographic projection, it will not converge.

thanks, what's the meaning of 150 in the J matrix you modified?

I tried several values and get best result when it is 150. The value is experimentally and should vary with different scene and camera pose. As i said in the early post, my math is poor, i dont kown how to derive the exact J matrix, it should be something related to current processing gaussian point's parameters. If you could help the derivation of J matrix or give some suggestions, it would be very appreciate!:)

Following the notation used in Wikipedia (https://en.wikipedia.org/wiki/Orthographic_projection) as mentioned by @haksorus , orthogonal projection can be represented by the matrix operation below: image Let denote the resulting coordinates on the right side with a prime symbol ('). Then, the Jacobian matrix is: image where we set the third row to 0, consistent with computeCov2D() in submodules/diff-gaussian-rasterization/cuda_rasterizer/forward.cu . Moreover, computeCov2D() requires the cov to be scaled to pixel space. Finally, we get the Jacobian matrix as: image Intuitively, this leads to the inclusion of a large value, such as 150 https://github.com/graphdeco-inria/gaussian-splatting/issues/578#issuecomment-1976123500, in the Jacobian matrix. By modifying the projection matrix as suggested in https://github.com/graphdeco-inria/gaussian-splatting/issues/578#issue-2056350773 and substituting the newly derived J for the original, I obtained a sharp image under orthogonal projection.

Regarding @Dehaoq 's concerns about depth, further discussion may be necessary.

Pydes-boop commented 1 week ago

@Sapphire-356 did you just manually calculate the jacobian and insert it into the .cu or did you manage to add your calculation in code? If you actually added some code changes to make this work with orthographic projection it would be great if you could share them.

Sapphire-356 commented 1 week ago

@Pydes-boop I'm not good at coding, so my implementation is pretty messy. Even so, I share my code here, which is just one way to do it that works. Hope someone else can come up with a better version.

The main idea behind the following code is to use tanfovx and tanfovy, which are not utilized in orthographic projection, to pass parameters used in the Jacobian matrix. This results in poor code readability, but it's a quick and easy solution. Note that the modification to pass orthographic = true from the .py file to the .cu file was omitted.

  1. (utils/graphics_utils.py) Modify the ProjectionMatrix to implement the orthographic projection:

    def getProjectionMatrix(znear, zfar, fovX, fovY, orthographic=False):
    if not orthographic:
        tanHalfFovY = math.tan((fovY / 2))
        tanHalfFovX = math.tan((fovX / 2))
    
        top = tanHalfFovY * znear
        bottom = -top
        right = tanHalfFovX * znear
        left = -right
    
        P = torch.zeros(4, 4)
    
        z_sign = 1.0
    
        P[0, 0] = 2.0 * znear / (right - left)
        P[1, 1] = 2.0 * znear / (top - bottom)
        P[0, 2] = (right + left) / (right - left)
        P[1, 2] = (top + bottom) / (top - bottom)
        P[3, 2] = z_sign
        P[2, 2] = z_sign * zfar / (zfar - znear)
        P[2, 3] = -(zfar * znear) / (zfar - znear)
    
        return P
    
    else:
        tanHalfFovY = math.tan((fovY / 2))
        tanHalfFovX = math.tan((fovX / 2))
    
        top = 5
        bottom = -top
        right = tanHalfFovX * 5 / tanHalfFovY
        left = -right
    
        P = torch.zeros(4, 4)
    
        z_sign = 1.0
        P[0, 0] = 2.0 / (right - left)
        P[0, 3] = - (right + left) / (right - left)
        P[1, 1] = 2.0 / (top - bottom)
        P[1, 3] = - (top + bottom) / (top - bottom)
        P[2, 2] = -2.0 / (zfar - znear)
        P[2, 3] = - (zfar + znear) / (zfar - znear)
        P[3, 3] = z_sign
    
        return (right - left) / 2, (top - bottom) / 2, P
  2. (scene/cameras.py) Add a new method called get_full_proj_transform() to class Camera() as follows:
    def get_full_proj_transform(self, orthographic=False):
        if not orthographic:
            return self.full_proj_transform
        else:
            tanfovx, tanfovy, projection_matrix = getProjectionMatrix(znear=self.znear, zfar=self.zfar, fovX=self.FoVx, fovY=self.FoVy, orthographic=True)
            full_proj_transform = (self.world_view_transform.unsqueeze(0).bmm(projection_matrix.transpose(0,1).cuda().unsqueeze(0))).squeeze(0)
            return tanfovx, tanfovy, full_proj_transform
  3. (gaussian_renderer/__init__.py) Modify the rasterization cofiguration as follows, where we call the method get_full_proj_transform() to change tanfovx, tanfovy, full_proj_transform:

    # Set up rasterization configuration
    if not orthographic:
        tanfovx = math.tan(viewpoint_camera.FoVx * 0.5)
        tanfovy = math.tan(viewpoint_camera.FoVy * 0.5)
        full_proj_transform = viewpoint_camera.get_full_proj_transform(orthographic)
    else:
        tanfovx, tanfovy, full_proj_transform = viewpoint_camera.get_full_proj_transform(orthographic)
    
    raster_settings = GaussianRasterizationSettings(
        image_height=int(viewpoint_camera.image_height),
        image_width=int(viewpoint_camera.image_width),
        tanfovx=tanfovx,
        tanfovy=tanfovy,
        bg=bg_color,
        scale_modifier=scaling_modifier,
        viewmatrix=viewpoint_camera.world_view_transform,
        projmatrix=full_proj_transform,
        sh_degree=pc.active_sh_degree,
        campos=viewpoint_camera.camera_center,
        prefiltered=False,
        debug=pipe.debug
    )
  4. (submodules/diff-gaussian-rasterization/cuda_rasterizer/forward.cu) Finally, modify the Jacobian matrix as:

    // Forward version of 2D covariance matrix computation
    __device__ float3 computeCov2D(const float3& mean, float focal_x, float focal_y, float tan_fovx, float tan_fovy, const float* cov3D, const float* viewmatrix, bool orthographic)
    {
    // The following models the steps outlined by equations 29
    // and 31 in "EWA Splatting" (Zwicker et al., 2002). 
    // Additionally considers aspect / scaling of viewport.
    // Transposes used to account for row-/column-major conventions.
    float3 t = transformPoint4x3(mean, viewmatrix);
    
    const float limx = 1.3f * tan_fovx;
    const float limy = 1.3f * tan_fovy;
    const float txtz = t.x / t.z;
    const float tytz = t.y / t.z;
    t.x = min(limx, max(-limx, txtz)) * t.z;
    t.y = min(limy, max(-limy, tytz)) * t.z;
    
    glm::mat3 J;
    if (orthographic)
    {
        J = glm::mat3( 
            focal_x, 0.0f, 0,
            0.0f, focal_y, 0,
            0, 0, 0);
    }
    else
    {
        J = glm::mat3(
            focal_x / t.z, 0.0f, -(focal_x * t.x) / (t.z * t.z),
            0.0f, focal_y / t.z, -(focal_y * t.y) / (t.z * t.z),
            0, 0, 0);
    }
    
    glm::mat3 W = glm::mat3(
        viewmatrix[0], viewmatrix[4], viewmatrix[8],
        viewmatrix[1], viewmatrix[5], viewmatrix[9],
        viewmatrix[2], viewmatrix[6], viewmatrix[10]);
    
    glm::mat3 T = W * J;
    
    glm::mat3 Vrk = glm::mat3(
        cov3D[0], cov3D[1], cov3D[2],
        cov3D[1], cov3D[3], cov3D[4],
        cov3D[2], cov3D[4], cov3D[5]);
    
    glm::mat3 cov = glm::transpose(T) * glm::transpose(Vrk) * T;
    
    return { float(cov[0][0]), float(cov[0][1]), float(cov[1][1]) };
    }