Closed Bin-ze closed 4 months ago
Hello Sorry for the late reply, I was sick for a week and stayed in bed. I actually face the same problem. I think this is because Opensfm only support CPU computation for feature extraction and matching which takes very long time in reconstruction. I found https://github.com/Unity-Technologies/ind-bermuda-opensfm. This repository uses superpoint or disk or other keypoints with GPU and it can speed up reconstruction. Unfortunately when I use disk or superpoint, I got a worse result with SIFT. This repository's docker file installs popsift but maybe version difference, it does not work(Segmentation fault) I am trying to figure out the main cause of this problem. Yaml file I use for reconstruction. I use higher thresh for geometric estimation. config.zip
Thank you very much for your reply! Hope your body recovers soon.
In order to speed up the reconstruction, I borrowed the sequential matching of methods such as colmap, and added the opensfm pipeline to replace the global matching when the GPS information is unknown.
In this way, I obtained more than 10 times speedup, because the matching complexity was successfully reduced to O(N), but the problem that followed was worse reconstruction results. I am currently trying to solve it and support order.
After matching, the speed is already very fast. What needs to be solved is the problem of reconstruction quality. I hope we can have more exchanges
Hello, It's good to hear you successfully solve the reconstruction speed problem. Maybe these params are similar with colmap sequential matching? matching_time_neighbors: 0 # Number of images to match selected by time taken. Set to 0 to disable matching_order_neighbors: 0 # Number of images to match selected by image name. Set to 0 to disable
You can investigate feature extraction and matching with bin/plot_features and bin/plot_matches.py. What I want to do from now is
Hi, I'm glad to hear your reply.
I have some new thoughts I want to discuss with you:
I am very much looking forward to your reply and looking forward to further in-depth communication with you.
Hello Sorry for the late reply, I was sick for a week and stayed in bed. I actually face the same problem. I think this is because Opensfm only support CPU computation for feature extraction and matching which takes very long time in reconstruction. I found https://github.com/Unity-Technologies/ind-bermuda-opensfm. This repository uses superpoint or disk or other keypoints with GPU and it can speed up reconstruction. Unfortunately when I use disk or superpoint, I got a worse result with SIFT. This repository's docker file installs popsift but maybe version difference, it does not work(Segmentation fault) I am trying to figure out the main cause of this problem. Yaml file I use for reconstruction. I use higher thresh for geometric estimation. config.zip
I would like to know what the configuration file for using a GPU-enabled operator looks like? This repository is too brief and almost no relevant description can be found.
Hello,
From the perspective of general Visual Odometry, SLAM, and SFM, it is believed that cameras tend to achieve more stable position calculations when they move across objects rather than directly towards them. Therefore, capturing the features of walls rather than the ground seems ideal. What kind of positional error are occurring? It is thought that using long-term data, SFM would achieve higher accuracy than SLAM. If the position is drifting, it might be a problem with Sequential matching.
The configuration file was obtained from here: https://opensfm.org/docs/_modules/opensfm/config.html This repository can be used: https://github.com/inuex35/ind-bermuda-opensfm The difference is that within the config, it is possible to use the features of POPSIFT, SUPERPOINT, DISK, ALIKED, and the matching algorithm of LIGHTGLUE. feature_type: matcher_type: These can be used by changing to the above algorithm names. However, POPSIFT seems not to be working due to a bug. Upon trying, ALIKED seems nice as it can capture many feature points.
Indoors I had a hard time getting good rendering results, there was significant aliasing most of the time, and if use 3dgs, even if init from a dense point cloud(sfm obtain), but will end up with very few point clouds, and the constructed 3d scene will be completely Loss of detail and very poor.
I analyzed the reasons as follows: There is a problem with pose estimation, because I tried multiple sets of captures in the same scene, and the differences were huge. Some captures were obviously better. I have reason to suspect that the results were better with SFM.
I'm trying to improve the results of sfm on indoor scenes, but haven't made progress yet. Because my scene has a large number of textureless areas, it was almost impossible to successfully calculate poses that can be used for fine training of nerf using superpoint+superglue. I've tried adding camera pose optimization from the image gradient to improve the current results, but have yet to bear fruit.
I'm stuck on this problem now, but don't have any solution that works. If you think anything differently, please let me know. Thank you so much
Hello
If you use your own dataset, then you should take care of the camera movement that includes enoght features.
I dont know you already found this but I will share you this. https://arxiv.org/abs/2402.00763 This paper addresses panoramic render so I am going to implement this renderer for my gaussian splatting. I hope this paper helps you.
I'm interested in implementing the method mentioned in the paper. If you have any suggestions and progress please let me know.
Hello,
I've made progress in implementing spherical rendering and am now just a step away from completion. It seems that there's an issue with the implementation of the covariance, which needs some more time to resolve, but I'm getting close to finishing the implementation. Please check this repository if you have free time. https://github.com/inuex35/360-diff-gaussian-rasterization/tree/spherical_render Original image Rendered image
Very happy to hear this news!
I am still trying to divide the 360 images into perspective images for scene modeling.
I want to discuss with you what are the advantages of using 360 images directly for reconstruction compared to the previous method? Can such a modeling approach provide higher reconstruction accuracy?
I noticed in your implementation you have: mode=panorama
When this mode is enabled, compared with the original gs model, all views are scrambled for training. Can it provide a more robust gradient? Because when I segment the 360 images into perspective and scramble the order for training, in a larger-scale scene (about 1,000 360 images, divided into 8,000 perspective images with fov=120 degrees), the model is in the process of optimization The desired densification cannot be achieved, the optimization will reduce the initial point cloud, and the scene will develop in a blurry direction. I'd like to hear your suggestions
I will be testing your 360 rasterizer later, it is a very useful implementation. But it would be even cooler if you could switch the rendering mode between perspective images and isometric images when rendering a 3D GS scene with a rasterizer.
Hello Although I'm not entirely confident about this implementation, the results seem decent so far. I trained this model by converting the data into a cubemap and scrambling four views, then I rendered with spherical render. However, I didn't complete the full training process and only used 17 images, so the model's performance is likely not up to par.
Can it provide a more robust gradient?
I want to verify the effect of concatenating panoramic images. However, 8000 images might be too many, so using fewer images may improve the model's performance. But I haven't completed the verification yet, so I'm not sure if that's really the case.
According to the research paper, rendering the concatenated images as a single image can mitigate the negative effects of stitching artifacts.
view from supersplat
This rendering has been enhanced with an equirectangular effect that enlarges the area near the poles. Looks nice.
Training seems working. You can try with your dataset.
This is my own data set: https://drive.google.com/file/d/1zSMMYnaQP7ES3odA3hXM4bquUzm2QZcU/view?usp=sharing
If it is convenient, I can conduct subsequent experiments based on this.
Thank you for sharing your data! I am going to train and see how it would be but could you check access permissions? I could not access it.
I have updated the permissions: https://drive.google.com/file/d/1zSMMYnaQP7ES3odA3hXM4bquUzm2QZcU/view?usp=sharing
Thank you for the update! I am going to train. Please wait a little.
I trained your data with these parameters. iterations = 30_000 position_lr_init = 0.00016 position_lr_final = 0.0000016 position_lr_delay_mult = 0.01 position_lr_max_steps = 30_000 feature_lr = 0.0010 opacity_lr = 0.01 scaling_lr = 0.00025 rotation_lr = 0.001 percent_dense = 0.01 lambda_dssim = 0.2 densification_interval = 100 opacity_reset_interval = 3000 densify_from_iter = 500 densify_until_iter = 15_000 densify_grad_threshold = 0.00002
Resolution was 1.6K(Gaussian splatting default).
Result is not perfect but does not look bad.
Floating noise on the floor is maybe because of masked area and lack of viewpoint.
Thank you very much for taking the time to test, I checked the render results and they are not bad!
But there is something I don’t understand. I looked at the point clouds after training, but these point clouds are no longer in the Euclidean space. They are distorted so that they lose their original sense of space.
I don't understand. If we establish a reflection of the 3D space and the isometric camera model during forward rendering, the point cloud should still be in the Euclidean space.
Could you give me a little bit detail or specific data? I think if something is wrong, it should be when computing 3D cov to 2D cov.
Hello, I successfully used your implementation to train the 360 images I collected. The entire pipeline is as follows:
Insta360 Capture -> opensfm compute 360 pose -> Perspective-and-Equirectangular trans -> train 3dgs
In the entire pipeline, almost most of the time is spent in opensfm compute 360 pose. I evaluated the time required for colmap to reconstruct the general image. Compared with opensfm to reconstruct the 360 image, the speed of opensfm is very slow.
I hope to use this pipeline and speed up the calculation of 360 poses,What do you recommend?