inuex35 / 360-gaussian-splatting

This repository contains programs for reconstructing 3D space using OpenSfM and Gaussian Splatting techniques. It allows users to generate point clouds from images captured by a 360-degree camera using OpenSfM, and then train Gaussian Splatting models using the generated point clouds.
Other
52 stars 6 forks source link

Regarding the speed of the entire pipeline #1

Closed Bin-ze closed 1 day ago

Bin-ze commented 5 months ago

Hello, I successfully used your implementation to train the 360 images I collected. The entire pipeline is as follows:

Insta360 Capture -> opensfm compute 360 pose -> Perspective-and-Equirectangular trans -> train 3dgs

In the entire pipeline, almost most of the time is spent in opensfm compute 360 pose. I evaluated the time required for colmap to reconstruct the general image. Compared with opensfm to reconstruct the 360 image, the speed of opensfm is very slow.

I hope to use this pipeline and speed up the calculation of 360 poses,What do you recommend?

inuex35 commented 5 months ago

Hello Sorry for the late reply, I was sick for a week and stayed in bed. I actually face the same problem. I think this is because Opensfm only support CPU computation for feature extraction and matching which takes very long time in reconstruction. I found https://github.com/Unity-Technologies/ind-bermuda-opensfm. This repository uses superpoint or disk or other keypoints with GPU and it can speed up reconstruction. Unfortunately when I use disk or superpoint, I got a worse result with SIFT. This repository's docker file installs popsift but maybe version difference, it does not work(Segmentation fault) I am trying to figure out the main cause of this problem. Yaml file I use for reconstruction. I use higher thresh for geometric estimation. config.zip

Bin-ze commented 5 months ago

Thank you very much for your reply! Hope your body recovers soon.

In order to speed up the reconstruction, I borrowed the sequential matching of methods such as colmap, and added the opensfm pipeline to replace the global matching when the GPS information is unknown.

In this way, I obtained more than 10 times speedup, because the matching complexity was successfully reduced to O(N), but the problem that followed was worse reconstruction results. I am currently trying to solve it and support order.

After matching, the speed is already very fast. What needs to be solved is the problem of reconstruction quality. I hope we can have more exchanges

inuex35 commented 5 months ago

Hello, It's good to hear you successfully solve the reconstruction speed problem. Maybe these params are similar with colmap sequential matching? matching_time_neighbors: 0 # Number of images to match selected by time taken. Set to 0 to disable matching_order_neighbors: 0 # Number of images to match selected by image name. Set to 0 to disable

You can investigate feature extraction and matching with bin/plot_features and bin/plot_matches.py. What I want to do from now is

Bin-ze commented 4 months ago

Hi, I'm glad to hear your reply.

I have some new thoughts I want to discuss with you:

  1. I found that most of the points in the point cloud generated in indoor scenes are concentrated on the textured ground, but the ground textures in the scene are highly similar, so I think it is unreasonable to extract ground key points for pose calculation. You What are your views and suggestions?
  2. In order to ensure accurate indoor positioning, is it difficult to implement the visual sensor + SFM method? I tried lidar, but it's hard to have a high quality RGB camera registered with it. I tried the RGBD camera, but running RGBD slam to obtain the attitude also had large deviations indoors, and these deviations had a huge impact on the training nerf. Do you have any good suggestions?
  3. I looked at your config.yaml configuration file and I'm curious where its reference comes from and I want to find a detailed explanation of each configuration to study how the configuration parameters are used for different scenarios.

I am very much looking forward to your reply and looking forward to further in-depth communication with you.

Bin-ze commented 4 months ago

Hello Sorry for the late reply, I was sick for a week and stayed in bed. I actually face the same problem. I think this is because Opensfm only support CPU computation for feature extraction and matching which takes very long time in reconstruction. I found https://github.com/Unity-Technologies/ind-bermuda-opensfm. This repository uses superpoint or disk or other keypoints with GPU and it can speed up reconstruction. Unfortunately when I use disk or superpoint, I got a worse result with SIFT. This repository's docker file installs popsift but maybe version difference, it does not work(Segmentation fault) I am trying to figure out the main cause of this problem. Yaml file I use for reconstruction. I use higher thresh for geometric estimation. config.zip

I would like to know what the configuration file for using a GPU-enabled operator looks like? This repository is too brief and almost no relevant description can be found.

inuex35 commented 4 months ago

Hello,

From the perspective of general Visual Odometry, SLAM, and SFM, it is believed that cameras tend to achieve more stable position calculations when they move across objects rather than directly towards them. Therefore, capturing the features of walls rather than the ground seems ideal. What kind of positional error are occurring? It is thought that using long-term data, SFM would achieve higher accuracy than SLAM. If the position is drifting, it might be a problem with Sequential matching.

The configuration file was obtained from here: https://opensfm.org/docs/_modules/opensfm/config.html This repository can be used: https://github.com/inuex35/ind-bermuda-opensfm The difference is that within the config, it is possible to use the features of POPSIFT, SUPERPOINT, DISK, ALIKED, and the matching algorithm of LIGHTGLUE. feature_type: matcher_type: These can be used by changing to the above algorithm names. However, POPSIFT seems not to be working due to a bug. Upon trying, ALIKED seems nice as it can capture many feature points.

Bin-ze commented 4 months ago

Indoors I had a hard time getting good rendering results, there was significant aliasing most of the time, and if use 3dgs, even if init from a dense point cloud(sfm obtain), but will end up with very few point clouds, and the constructed 3d scene will be completely Loss of detail and very poor.

I analyzed the reasons as follows: There is a problem with pose estimation, because I tried multiple sets of captures in the same scene, and the differences were huge. Some captures were obviously better. I have reason to suspect that the results were better with SFM.

I'm trying to improve the results of sfm on indoor scenes, but haven't made progress yet. Because my scene has a large number of textureless areas, it was almost impossible to successfully calculate poses that can be used for fine training of nerf using superpoint+superglue. I've tried adding camera pose optimization from the image gradient to improve the current results, but have yet to bear fruit.

I'm stuck on this problem now, but don't have any solution that works. If you think anything differently, please let me know. Thank you so much

inuex35 commented 4 months ago

Hello

If you use your own dataset, then you should take care of the camera movement that includes enoght features.

I dont know you already found this but I will share you this. https://arxiv.org/abs/2402.00763 This paper addresses panoramic render so I am going to implement this renderer for my gaussian splatting. I hope this paper helps you.

Bin-ze commented 4 months ago

I'm interested in implementing the method mentioned in the paper. If you have any suggestions and progress please let me know.

inuex35 commented 4 months ago

Hello,

I've made progress in implementing spherical rendering and am now just a step away from completion. It seems that there's an issue with the implementation of the covariance, which needs some more time to resolve, but I'm getting close to finishing the implementation. Please check this repository if you have free time. https://github.com/inuex35/360-diff-gaussian-rasterization/tree/spherical_render Original image 00000 Rendered image 00000

Bin-ze commented 3 months ago

Very happy to hear this news!

I am still trying to divide the 360 images into perspective images for scene modeling.

I want to discuss with you what are the advantages of using 360 images directly for reconstruction compared to the previous method? Can such a modeling approach provide higher reconstruction accuracy?

I noticed in your implementation you have: mode=panorama

When this mode is enabled, compared with the original gs model, all views are scrambled for training. Can it provide a more robust gradient? Because when I segment the 360 images into perspective and scramble the order for training, in a larger-scale scene (about 1,000 360 images, divided into 8,000 perspective images with fov=120 degrees), the model is in the process of optimization The desired densification cannot be achieved, the optimization will reduce the initial point cloud, and the scene will develop in a blurry direction. I'd like to hear your suggestions

I will be testing your 360 rasterizer later, it is a very useful implementation. But it would be even cooler if you could switch the rendering mode between perspective images and isometric images when rendering a 3D GS scene with a rasterizer.

inuex35 commented 3 months ago

Hello Although I'm not entirely confident about this implementation, the results seem decent so far. I trained this model by converting the data into a cubemap and scrambling four views, then I rendered with spherical render. However, I didn't complete the full training process and only used 17 images, so the model's performance is likely not up to par.

Can it provide a more robust gradient?

I want to verify the effect of concatenating panoramic images. However, 8000 images might be too many, so using fewer images may improve the model's performance. But I haven't completed the verification yet, so I'm not sure if that's really the case.

According to the research paper, rendering the concatenated images as a single image can mitigate the negative effects of stitching artifacts.

00000

00000 view from supersplat image

This rendering has been enhanced with an equirectangular effect that enlarges the area near the poles. Looks nice.

00000

inuex35 commented 3 months ago

Training seems working. You can try with your dataset.

Bin-ze commented 3 months ago

This is my own data set: https://drive.google.com/file/d/1zSMMYnaQP7ES3odA3hXM4bquUzm2QZcU/view?usp=sharing

If it is convenient, I can conduct subsequent experiments based on this.

inuex35 commented 3 months ago

Thank you for sharing your data! I am going to train and see how it would be but could you check access permissions? I could not access it.

Bin-ze commented 3 months ago

I have updated the permissions: https://drive.google.com/file/d/1zSMMYnaQP7ES3odA3hXM4bquUzm2QZcU/view?usp=sharing

inuex35 commented 3 months ago

Thank you for the update! I am going to train. Please wait a little.

inuex35 commented 3 months ago

I trained your data with these parameters. iterations = 30_000 position_lr_init = 0.00016 position_lr_final = 0.0000016 position_lr_delay_mult = 0.01 position_lr_max_steps = 30_000 feature_lr = 0.0010 opacity_lr = 0.01 scaling_lr = 0.00025 rotation_lr = 0.001 percent_dense = 0.01 lambda_dssim = 0.2 densification_interval = 100 opacity_reset_interval = 3000 densify_from_iter = 500 densify_until_iter = 15_000 densify_grad_threshold = 0.00002

Resolution was 1.6K(Gaussian splatting default).

Result is not perfect but does not look bad.

Floating noise on the floor is maybe because of masked area and lack of viewpoint.

https://xgf.nu/3v02b

Bin-ze commented 3 months ago

Thank you very much for taking the time to test, I checked the render results and they are not bad!

But there is something I don’t understand. I looked at the point clouds after training, but these point clouds are no longer in the Euclidean space. They are distorted so that they lose their original sense of space.

I don't understand. If we establish a reflection of the 3D space and the isometric camera model during forward rendering, the point cloud should still be in the Euclidean space.

inuex35 commented 3 months ago

Could you give me a little bit detail or specific data? I think if something is wrong, it should be when computing 3D cov to 2D cov.