graphdeco-inria / gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Other
14.82k stars 1.94k forks source link

How to apply the gaussian splatting method to images collected by panoramic cameras? #616

Closed Bin-ze closed 10 months ago

Bin-ze commented 10 months ago

I want to know if the colmap camera model is changed to reconstruct the point cloud and pose from the panoramic image, can the algorithm be adapted to such a scene? Or does it require some additional implementation?

inuex35 commented 10 months ago

Hello, I also tried to do 360 cam Gaussian splatting and I found we would need to update diff Gaussian rasterization because it uses tangent in the calculations for splat in the field of view and it means that simply setting the field of view to 360 degrees will be 0. There are still many issues in this creation stage, but by dividing the equirectangular image and applying Gaussian splatting to it, it becomes possible to use images from a 360-degree camera. Opensfm can use 360 camera images so this is the code for 360 gaussian splatting. https://colab.research.google.com/drive/18inMCEbkjYQOXpM72d38wBML3DzDECnf?usp=sharing

Bin-ze commented 10 months ago

Thank you very much for your reply. I've also done some research on this issue, but still have some questions:

  1. Is your solution to directly use the fisheye camera model to splat the point cloud corresponding to the pixels of the 360 image to the correct position?
  2. If you use some transformation methods, such as https://github.com/inuex35/Perspective-and-Equirectangular/tree/d453d59470b83ff744c3f986b0832ea26e4f7638 used in your implementation So isn't it possible to use the original pipeline to process 360 images by re-transforming Equirectangular to perspective? Is this your current solution? That is to say, the visual frustum assumption in the CUDA implementation has not been modified.
  3. I currently lack the data set for experiments. Can you share with me the data sets you used for experiments?
  4. I want to know where the camera pose of the 360 image and the sparse point cloud are obtained? Reconstruct directly from colmap or from an open source dataset
inuex35 commented 10 months ago

Hello you can get acceess token from mapillary and look for the sequence that you want to reconstruct from mapillary. https://www.mapillary.com/ if the sequence is from 360 camera, set panorama true in the colab script. You can get a sequence id from here. image Note that you can use mapillary images but this is not for gaussian splatting, algorithm works but you would not get a satisfying result as gaussian splatting demo (image capturing interval is important!). My first approach was split equirectangular images into 6 90-degree perspective images(some overlaps between each images and no top and bottom) and feeded into colmap as perspective cameras. That approach worked well but the problem was some of the orientation failed to reconstruct.(for example, the front image can be used for reconstruction but back image can not because of lack of feature points) Then I cannot use that image for train gaussian splatting. Additionally I am going to use 360 camera outdoor, and even if feature matching does not connect sequence, opensfm can connect point cloud with gps exif data. So opensfm can use 360 camera and my approach first make sparce point cloud with opensfm and before feeding into gaussian splatting, split equirectangular image into 4 directions. I am trying to render a panorama image with diff gaussian rasterization but it has not worked yet.

You can use your data with opensfm or the colab script will download images and point cloud from mapillary api. I could run the script a few days ago and sorry for no good instruction. However this approach still has a problem, a little fewer point cloud than colmap approach so I am investgating to increase points in point cloud.

Bin-ze commented 10 months ago

Thank you very much, your reply is very helpful to me.

I checked your 360-gaussian-splating implementation and found that you spliced the front, rear, left and right perspectives when rendering, and then used gt images in the corresponding format that have been processed into perspective images for supervision. Is my understanding correct?

I'm thinking about whether a 360-degree panoramic camera model can describe the mapping relationship from image space to 3D space, so that 360 images can be directly rendered like the original 3D GS implementation. When imaging a 360 panoramic image, all the light rays outside the entire sphere converge and the light center. If you use a model based on ray casting like nerf, it will be very easy to build a rendering pipeline, because just like perspective projection, you can easily find the light and light center. Corresponding pixel relationship. Based on this, as long as the 3D point can be back-projected to the corresponding pixel position during splatting, this process is achieved. Do you have any other insights?

I'm trying to run through your pipeline and you mentioned that you already got some results, can you give some visual examples? Can you share with me a short sequence of images, not too many, I think about 30 images is enough, I want to test the effect.

Based on discussions with you, I now summarize the approach to using 360 images:

  1. Map the 360 image to a standard perspective image to train the 3D model using the original pipeline. Once the 3D model training is completed, 360 rendering can be achieved with the help of the new camera model.
  2. Implement projection mapping from 3D to 360 images in the rendering pipeline to correctly supervise training. This process is difficult and requires modifying the cuda code. At this time, we will see all the images in the entire space, not the images inside the viewing frustum.
inuex35 commented 10 months ago

Hello,

Yes, 4 images are created from 1 equirectangular image. The spherical camera model in OpenSfM is quite straightforward, so you should be able to integrate it with your rendering program. You can find more information about camera models here: OpenSfM Documentation - Camera Models.

To calculate the loss for updating the splat, you need to obtain the rendered image and radii using the render() function. Gaussian Splatting on GitHub. And around here? https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/59f5f77e3ddbac3ed9db93ec2cfe99ed6c5d121d/cuda_rasterizer/forward.cu#L261-L374

This is the first sparse point cloud: image Then, after 30,000 iterations, the final splat looks like this: image

I used 360-degree camera images within this bounding box: [139.64931, 35.44524, 139.65090, 35.44608]. The feature point detection was performed using SIFT, and the Gaussian splatting training parameters were set to default.

I think some parameter tuning is necessary for 360-degree cameras, but I have not yet determined the optimal settings to achieve the best splatting results.

Bin-ze commented 10 months ago

I used the colab script you provided and successfully initiated training. I got a PSNR of 27, but the point cloud obtained by training has no sense of space at all. I think it is completely wrong. I have currently lost the downloaded data, but the visualization results of the point cloud are as follows:

image image

I didn't change any part of your implementation, and judging from the PSNR, the results shouldn't be what they are. Do you have any advice to offer regarding this issue?

inuex35 commented 10 months ago

Hello this is maybe because input point cloud data is very sparse. This is the problem I want to solve. But actually the splat you were trying to reconstruct is somewhere in this splat.

You can use my data I used above. https://dtbn.jp/3LJn5cee

My code tryies to find reconstruction.json in dataset folder, put images_split(split images with my converter) folder and reconstruction.json together. You can reconstruct a point cloud by your self with opensfm.

Bin-ze commented 10 months ago

I used your dataset and visualized the results, and I found an obvious problem: the camera position was not as expected

render image:

image

redner view pcd:

image

camera:

image

Cameras with the same panorama should logically share the same origin in the world system, but here the cameras are distributed in groups of two. So I looked at your implementation of converting the camera pose:

image

Your idea is to rotate the axes of the camera coordinate system for images at different angles. But I think the T matrix in the world coordinate system should not change. But I try to annotate: After T = np.matmul(R_y.transpose(), T), I still haven't achieved good results, although I think the problem lies here. I look forward to getting your reply. I think solving this problem may lead to better rendering results.

Bin-ze commented 10 months ago

I think this is correct:

    R_y = np.array([[ 0.0, 0.0,  1.0, 0.0], [ 0.0,  1.0,  0.0, 0.0], [ -1.0,  0.0,  0.0, 0.0], [ 0.0,  0.0,  0.0, 1.0]])
    if extr.camera_id <= 3:
        for i in range(extr.camera_id):
            Rt = np.zeros((4, 4)) # w2c
            Rt[:3, :3] = R.transpose() 
            Rt[:3, 3] = T
            Rt[3, 3] = 1.0  
            c2w_tmp = np.linalg.inv(Rt) # c2w
            RT_c2w = np.matmul(c2w_tmp, R_y.transpose())
            # RT_c2w = np.matmul(c2w_tmp, R_y.transpose())
            R = RT_c2w[:3, :3]
            T = np.linalg.inv(RT_c2w)[:3, 3]
image
inuex35 commented 10 months ago

Hello

Thank you very much! What is the name of the viewer you are using? I was writing the program without knowing where the camera is pointing.

The result turned out quite well, and the quality is about the same as when the images are splited and reconstructed sparse point cloud with colmap. image

Bin-ze commented 10 months ago

In order to facilitate development, I customized a viewer. I used this viewer to visualize the camera under the world system.

I will continue to study how to directly render 360 images and how to improve the rendering quality of such images. I hope to have more opportunities to communicate with you.

My current idea is to use opensfm's 360 camera model to directly reconstruct the camera pose, and modify the rasterizer to support splatting the 3d point cloud to the 360 2d image. What do you think?

I have another question to ask you: In addition to converting the panorama into 4 images with an FOV of 90 degrees, can it be converted to any FOV and the corresponding perspective pose calculated based on the panorama pose? I think it is feasible, but my experience is still limited.

I hope to get your reply, and I would like to ask you if I can add your social account to make it more convenient to discuss this matter.

Bin-ze commented 10 months ago

Hello

Thank you very much! What is the name of the viewer you are using? I was writing the program without knowing where the camera is pointing.

The result turned out quite well, and the quality is about the same as when the images are splited and reconstructed sparse point cloud with colmap. image

I want to know how to get the current rendering result? You used a 90-degree FOV during training, but used a wider FOV when rendering or changed the camera's internal parameters to render the current rectangular image?

inuex35 commented 10 months ago

Hello,

That app looks very nice for debugging.

I think your idea is feasible. Currently, I don't fully understand the cuda rasterizer, so I'm unsure about the difficulty level, but I believe we should handle it with a spherical camera model in the program, rather than rendering in equirectangular. This is because, in equirectangular, the area near the poles becomes significantly larger, and in our current gaussian splatting loss calculation, the areas near the poles are overly emphasized. It is calculated simply based on l1_loss and the similarity of the images. Current implementation for loss I can think of two issues right now. First, treating images as a spherical camera model instead of equirectangular. Second, when used outdoors, the large area of the sky in the images can skew the training towards sky elements. To address this, we might need to use results from semantic segmentation as a mask, and in the rasterizer, exclude and process within the mask. Ultimately, elements like buildings are more important than the color of the sky. update : Someone already implemented mask for gs. https://colab.research.google.com/drive/1wKU58ATW8FkgKmQyCb3O1_1ja1ElqVu8?usp=sharing

Yes, I think you can get any field of view (FOV) and orientation with that equirectangular library. Look at the function used in the converter.

equirect to perspective Yes, sharing an account like X is possible, but we should have discussions in an open space. Someone might solve the issue.

The image above was uploaded to supersplat and then screenshot.

For someone who is looking for 360 gaussian splatting. https://github.com/inuex35/360-gaussian-splatting