30k export problem ? - Githubissues

TXSevenXT commented 2 months ago

Hello !

First of all, thank you for your work and for keeping up with the updates :)

I've just run a test on a 30sec mp4 video with the following code: python run_single.py --colmap_name cascade --video_extension mp4

The calculation finishes well but a problem occurs when exporting the 30k ply (The 7k comes out fine and is usable): Optimizing /home/xxxxxxx/gs2mesh/splatting_output/custom_nw_iterations30000/cascade Output folder: /home/xxxxxxx/gs2mesh/splatting_output/custom_nw_iterations30000/cascade [04/09 08:44:34] Reading camera 86/86 [04/09 08:44:35] Converting point3d.bin to .ply, will happen only the first time you open the scene. [04/09 08:44:35] Loading Training Cameras [04/09 08:44:35] [ INFO ] Encountered quite large input images (>1.6K pixels width), rescaling to 1.6K. If this is not desired, please explicitly specify '--resolution/-r' as 1 [04/09 08:44:35] Loading Test Cameras [04/09 08:45:11] Number of points at initialisation : 43182 [04/09 08:45:11] Training progress: 23%|█████████▊ | 7000/30000 [09:00<35:06, 10.92it/s, Loss=0.1461526] [ITER 7000] Evaluating train: L1 0.07914816588163376 PSNR 18.642070770263672 [04/09 08:54:13]

[ITER 7000] Saving Gaussians [04/09 08:54:14] Training progress: 100%|█████████████████████████████████████████| 30000/30000 [54:02<00:00, 9.25it/s, Loss=0.1150575]

[ITER 30000] Evaluating train: L1 0.06295853853225708 PSNR 20.098556518554688 [04/09 09:39:23]

[ITER 30000] Saving Gaussians [04/09 09:39:24] Killed num views: 86 baseline: 0.2705555039768162 RPly: Unable to open file [Open3D WARNING] Read PLY failed: unable to open file: /home/xxxxxxx/gs2mesh/splatting_output/custom_nw_iterations30000/cascade/point_cloud/iteration_30000/point_cloud.ply Loading trained model at iteration 30000 Reading camera 86/86 Loading Training Cameras Loading Test Cameras Traceback (most recent call last): File "run_single.py", line 190, in run_single(args) File "run_single.py", line 90, in run_single renderer.prepare_renderer() File "/home/xxxxxxx/gs2mesh/gs2mesh_utils/renderer_utils.py", line 356, in prepare_renderer scene = Scene(dataset, self.gaussians, load_iteration=self.splatting_iteration, shuffle=False) File "/home/xxxxxxx/gs2mesh/third_party/gaussian-splatting/scene/init.py", line 78, in init self.gaussians.load_ply(os.path.join(self.model_path, File "/home/xxxxxxx/gs2mesh/third_party/gaussian-splatting/scene/gaussian_model.py", line 216, in load_ply plydata = PlyData.read(path) File "/home/xxxxxxx/anaconda3/envs/gs2mesh/lib/python3.8/site-packages/plyfile.py", line 158, in read (must_close, stream) = _open_stream(stream, 'read') File "/home/xxxxxxx/anaconda3/envs/gs2mesh/lib/python3.8/site-packages/plyfile.py", line 1345, in _open_stream return (True, open(stream, read_or_write[0] + 'b')) FileNotFoundError: [Errno 2] No such file or directory: '/home/xxxxxxx/gs2mesh/splatting_output/custom_nw_iterations30000/cascade/point_cloud/iteration_30000/point_cloud.ply'

Have you already encountered this problem? If so, do you have any advice on how to solve it? For information, I'm running Windows 11, CUDA 11.8, WSL2/Ubuntu 22.04

Thanks for your help :)

Have a nice day ^_^

yanivw12 commented 2 months ago

Hi,

It seems like it could be a memory issue when exporting the ply file: https://github.com/graphdeco-inria/gaussian-splatting/issues/235 A quick fix that they suggest is saving the ply only after 30k iterations and not after 7k. To do that, add the --GS_save_test_iterations 30000 argument and see if it helps.

TXSevenXT commented 2 months ago

Thank you very much for your help ^^

After activating the backup memory in the nvidia settings, no more worries with the previous error, thank you :)

However, something is missing to extract the meshes: [ITER 30000] Evaluating train: L1 0.00707147466018796 PSNR 39.55408630371094 [04/09 17:35:16]

[ITER 30000] Saving Gaussians [04/09 17:35:16]

Training complete. [04/09 17:35:23] num views: 3 baseline: 0.27544912783794784 Loading trained model at iteration 30000 Reading camera 3/3 Loading Training Cameras Loading Test Cameras 0%| | 0/3 [00:00<?, ?it/s]UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1716905971873/work/aten/src/ATen/native/TensorShape.cpp:3587.) UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details. 100%|█████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16<00:00, 5.43s/it] Automask must be enabled for masking in script mode. Skipping. 100%|█████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 4.72it/s] [Open3D WARNING] Write PLY failed: mesh has 0 vertices. SAVED MESH [Open3D DEBUG] [ClusterConnectedTriangles] Compute triangle adjacency [Open3D DEBUG] [ClusterConnectedTriangles] Done computing triangle adjacency [Open3D DEBUG] [ClusterConnectedTriangles] Done clustering, #clusters=0 [Open3D WARNING] Write PLY failed: mesh has 0 vertices. SAVED CLEANED MESH

With or without the argument you provided earlier, I arrive at the same point. It's not a 360° video but rather ‘120-140°’.

yanivw12 commented 2 months ago

The problem you have now is that the depth is being cropped to soon, so increasing the horizontal baseline and the truncation limit should solve it. It's better to first increase the baseline and then the truncation limit though.

To increase the baseline, add the --no-renderer_360_scene argument, which should calculate a better horizontal baseline given that your scene is not a 360 scene (the resulting baseline should be larger than the current baseline, which is 0.275). If that still doesn't work, increase the truncation limit with the argument --TSDF_max_depth_baselines to something larger than the default 20.

yanivw12 commented 2 months ago

Also, note that you only have 3 views in your current model, whereas in the first message that you sent you had 86 views. 3 views for COLMAP and 3DGS is usually insufficient.

TXSevenXT commented 2 months ago

Also, note that you only have 3 views in your current model, whereas in the first message that you sent you had 86 views. 3 views for COLMAP and 3DGS is usually insufficient.

Hi again,

This seems to be related to the '--GS_save_test_iterations 30000' argument Without it, it finds 86 views.

I'll try with the last arguments and keep you posted.

Thank you very much for your help in any case 😊

TXSevenXT commented 2 months ago

Hello !

I have a problem because the program always ends up “killed”... It seems to be a memory problem but I don't understand why... It seems to use RAM but little or no VRAM... Even with 29 frames in 720p... I'm a bit stumped.

I have no problem with 3DGS and 200 images in 1600p :(...

TXSevenXT commented 2 months ago

Here the history for next code line "python run_single.py --colmap_name cascade --skip_video_extraction --no-renderer_scene_360 --TSDF_max_depth_baselines 30 --GS_save_test_iterations 30000" : Training progress: 100%|██████████████████████████████████████████| 30000/30000 [13:54<00:00, 35.95it/s, Loss=0.0260883]

[ITER 30000] Evaluating train: L1 0.018139631859958174 PSNR 29.62316818237305 [06/09 13:13:30]

[ITER 30000] Saving Gaussians [06/09 13:13:30]

Training complete. [06/09 13:13:47] num views: 29 baseline: 0.6796290173109062 Loading trained model at iteration 30000 Reading camera 29/29 Loading Training Cameras Loading Test Cameras 0%| | 0/29 [00:00<?, ?it/s]UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1716905971873/work/aten/src/ATen/native/TensorShape.cpp:3587.) UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details. 100%|███████████████████████████████████████████████████████████████████████████████████| 29/29 [00:21<00:00, 1.32it/s] Automask must be enabled for masking in script mode. Skipping. 10%|████████▋ | 3/29 [01:52<18:30, 42.71s/it] Killed

yanivw12 commented 2 months ago

See https://github.com/yanivw12/gs2mesh/issues/1

Your code gets killed in the TSDF step, and there seems to be some bug with the Open3D TSDF. If you're using CUDA11.8 and python3.8, the problem could be the Ubuntu version you're using (I tested it on 20.04, and you're using 22.04). More details in the "Common Issues and Tips" section. From what I've tested, with Ubuntu 20.04 and python 3.7 and 3.8 the TSDF runs without an issue.

TXSevenXT commented 2 months ago

Thank you for your help, I'll test with Ubuntu 20.04 :)

Last question, Is 32Gb of RAM enough to run your code? I have a 4090 next to it but I think that the RAM may be a bit borderline with gs2mesh?

Thank you again for your time and advice 😊

yanivw12 commented 2 months ago

The GS and Stereo models should run fine on a 4090 (with 24GB of VRAM). The only step that can take up RAM is the TSDF, since it's not a GPU accelerated version(*). I haven't really tested how much RAM it takes, but I assume 32GB is enough.

(*) As a side note, I also tested a GPU accelerated TSDF (https://github.com/andyzeng/tsdf-fusion-python), but from my experience with it, it's way more heavier on the GPU memory and I often encountered "CUDA out of memory" (with an L40 with 50GB), especially for larger scenes or higher resolutions. The results also looked visually better with the Open3D version.

TXSevenXT commented 2 months ago

Hello Yaniv,

Thanks again for your help. I have installed ubuntu 20.04 with everything that goes well but I am stuck because diff_rasterization_gaussian needs glibc 2.32 and ubuntu 20.04 stops at 2.31.. I've tried everything to move to version 2.32 but I'm at a dead end. How did you circumvent the problem please? (gs2mesh) xxxxxxx@Jeremie:~/gs2mesh$ python run_single.py --colmap_name cascade --skip_video_extraction --no-renderer_scene_360 --TSDF_max_depth_baselines 30 Traceback (most recent call last): File "run_single.py", line 14, in from gs2mesh_utils.renderer_utils import Renderer File "/home/xxxxxxx/gs2mesh/gs2mesh_utils/renderer_utils.py", line 25, in from gaussian_renderer import render File "/home/xxxxxxx/gs2mesh/third_party/gaussian-splatting/gaussian_renderer/init.py", line 14, in from diff_gaussian_rasterization import GaussianRasterizationSettings, GaussianRasterizer File "/home/xxxxxxx/anaconda3/envs/gs2mesh/lib/python3.8/site-packages/diff_gaussian_rasterization/init.py", line 15, in from . import _C ImportError: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /home/xxxxxxx/anaconda3/envs/gs2mesh/lib/python3.8/site-packages/diff_gaussian_rasterization/_C.cpython-38-x86_64-linux-gnu.so)

yanivw12 commented 2 months ago

I checked on the server I'm using, and I'm also running glibc 2.31... Maybe clean the previous installation (the entire diff_gaussian_rasterization folder) and re-install?

TXSevenXT commented 2 months ago

Same problem with WSL Ubuntu 20.04 :

Optimizing /home/xsevenx/gs2mesh/splatting_output/custom_nw_iterations30000/cascade Output folder: /home/xsevenx/gs2mesh/splatting_output/custom_nw_iterations30000/cascade [11/09 14:16:53] Reading camera 29/29 [11/09 14:16:53] Converting point3d.bin to .ply, will happen only the first time you open the scene. [11/09 14:16:53] Loading Training Cameras [11/09 14:16:53] Loading Test Cameras [11/09 14:16:54] Number of points at initialisation : 17352 [11/09 14:16:54] Training progress: 100%|██████████████████████████████████████████| 30000/30000 [14:48<00:00, 33.75it/s, Loss=0.0265618]

[ITER 30000] Evaluating train: L1 0.018692961521446706 PSNR 29.656937789916995 [11/09 14:31:48]

[ITER 30000] Saving Gaussians [11/09 14:31:49]

Training complete. [11/09 14:32:06] num views: 29 baseline: 0.6802225502353557 Loading trained model at iteration 30000 Reading camera 29/29 Loading Training Cameras Loading Test Cameras 0%| | 0/29 [00:00<?, ?it/s]UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1716905971873/work/aten/src/ATen/native/TensorShape.cpp:3587.) UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details. 100%|███████████████████████████████████████████████████████████████████████████████████| 29/29 [00:25<00:00, 1.13it/s] Automask must be enabled for masking in script mode. Skipping. 14%|███████████▌ | 4/29 [01:57<14:14, 34.17s/it]

It's using swap like something about 140Gb... Same 29 files 720p (picture weight 500Ko average) with this command : python run_single.py --colmap_name cascade --skip_video_extraction --no-renderer_scene_360 --TSDF_max_depth_baselines 30 --GS_save_test_iterations 30000

yanivw12 commented 2 months ago

Try using a lower resolution for the grid (higher TSDF_voxel). 34 seconds per iteration of TSDF is way too much (should take a couple of seconds max). I suggest visualizing the depth maps that TSDF is using with the custom_data.ipynb notebook. It might help understand why.

TXSevenXT commented 2 months ago

Really nice, I've tried with --TSDF_voxel 4 instead of 2 and results is great :) Cleanning needs best settings but ply is clean.

cascade_voxel_4

Thank you very much for your help :)

I'll try with ubuntu 24.04 in order to see if all is Ok =)

yanivw12 commented 2 months ago

Looks great! It's nice seeing people using the method on their own data. That was one of the things I was anticipating the most when releasing the code.

I'm closing the issue for now, feel free to re-open if necessary.

yanivw12 / gs2mesh

30k export problem ? #8