yanivw12 / gs2mesh

Official implementation of the ECCV 2024 paper "GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views"
99 stars 4 forks source link

Be killed #1

Closed scy5335 closed 1 month ago

scy5335 commented 1 month ago

I run the code "run_single.py" with my own dataset. When running to "tsdf_utils.py", the process will be killed. How can I solve this problem?("1?","2?"... in the output is what I use to local where the code is killed.) image

yanivw12 commented 1 month ago
  1. Does it always happen on image 76?
  2. Can you please share what line in the code causes the problem (after the “1??” print)?
scy5335 commented 1 month ago
  1. Does it always happen on image 76?
  2. Can you please share what line in the code causes the problem (after the “1??” print)?
  1. No. It may stopped at image 69, 76 or other image. It's not a certain number.
  2. I put the print before the last "for" and it will be killed in the following "for". image
yanivw12 commented 1 month ago

Can you check via debugger/additional print statements what line inside of the for loop causes the issue?

scy5335 commented 1 month ago

Can you check via debugger/additional print statements what line inside of the for loop causes the issue?

I test for several times,the code is killed when running the Line 116: volume.integrate(rgbd_image, intrinsics, np.linalg.inv(extrinsic_matrix)) I use open3d==0.17.0. Is it possible that different version of open3d cause this bug?

yanivw12 commented 1 month ago

That seems to be the same version that I'm using (0.17.0). From what I'm reading, it could be a memory issue with Open3D. Try running on a different server, or if you want, you can share the source video with me via google drive and I'll run it on my server.

gnulife commented 1 month ago

That seems to be the same version that I'm using (0.17.0). From what I'm reading, it could be a memory issue with Open3D. Try running on a different server, or if you want, you can share the source video with me via google drive and I'll run it on my server.

I also encountered the same problem. In order to adapt to cuda11.3, I used Ubuntu 18.04. Which operating system version are you using?

yanivw12 commented 1 month ago

I'm using Ubuntu 20.04.6 LTS with CUDA11.3. You can probably use other CUDA versions if you have them available on a different machine with a higher Linux version - from what I see, the third party libraries (DLNR and 3DGS) support higher versions of CUDA. I will update the environment to run on a higher version if that mitigates the issue. Regarding the issue itself, I could only find issues that resemble what you're experiencing that were relevant in older versions of Open3D: https://github.com/isl-org/Open3D/issues/2700 https://github.com/isl-org/Open3D/issues/2107

In the meanwhile, as a workaround for both of you @gnulife @scy5335 , try reducing the number of images fed into TSDF using the TSDF_dilate argument (default is 1, change to 2 or 3 - it'll take every 2nd/3rd image only). It'll reduce the quality of the result, but it will help confirm that the code is able to finish running when using a smaller amount of images.

Please update if anything helps.

gnulife commented 1 month ago

thanks, i have solved the problem throgh change os to ubuntu 20.04 & cuda 11.3.

yanivw12 commented 1 month ago

Good to know! I'll add a comment about this on the main page. @scy5335 can you confirm that it works for you as well with Ubuntu 20.04 so that I can close the issue?

gnulife commented 1 month ago

cuda 11.3 is too old, it would be great if it could be upgraded to cuda 11.8

Is gs2mesh/third_party /gaussian-splatting the original gaussian-splatting? Or has it been modified?

scy5335 commented 1 month ago

Good to know! I'll add a comment about this on the main page. @scy5335 can you confirm that it works for you as well with Ubuntu 20.04 so that I can close the issue?

Thanks for your help. I try changing the TSDF_dilate from 1 to 2 and the code can save mesh but finally be kiled. If I change the os version to Ubuntu 20.04, the problem will be solved.

yanivw12 commented 1 month ago

cuda 11.3 is too old, it would be great if it could be upgraded to cuda 11.8

Is gs2mesh/third_party /gaussian-splatting the original gaussian-splatting? Or has it been modified?

You're right - I'll update to 11.8 and see if it works, and update the environment.yml file. I don't see a reason why it won't though, since I haven't modified the gaussian-splatting code at all, so it should compile without an issue.

Thanks for your help. I try changing the TSDF_dilate from 1 to 2 and the code can save mesh but finally be kiled. If I change the os version to Ubuntu 20.04, the problem will be solved.

So to confirm, is the problem solved?

scy5335 commented 1 month ago

cuda 11.3 is too old, it would be great if it could be upgraded to cuda 11.8 Is gs2mesh/third_party /gaussian-splatting the original gaussian-splatting? Or has it been modified?

You're right - I'll update to 11.8 and see if it works, and update the environment.yml file. I don't see a reason why it won't though, since I haven't modified the gaussian-splatting code at all, so it should compile without an issue.

Thanks for your help. I try changing the TSDF_dilate from 1 to 2 and the code can save mesh but finally be kiled. If I change the os version to Ubuntu 20.04, the problem will be solved.

So to confirm, is the problem solved?

Yes, the problem is solved by changing Linux to a higher version.

yanivw12 commented 1 month ago

Great! I will close the issue after checking CUDA 11.8 compatibility.

guwinston commented 1 month ago

I encountered the same problem, and the code running process was very time-consuming. The training data was a mipnerf360 bicycle scenario, and Gaussian's training only took half an hour. However, the code ran for 17 hours and then being killed. I used a 4090GPU, CUDA11.8, and WLS2 Ubuntu20.04 subsystem. Any solutions? image

yanivw12 commented 1 month ago

I encountered the same problem, and the code running process was very time-consuming. The training data was a mipnerf360 bicycle scenario, and Gaussian's training only took half an hour. However, the code ran for 17 hours and then being killed. I used a 4090GPU, CUDA11.8, and WLS2 Ubuntu20.04 subsystem. Any solutions?

MipNeRF360 images are huge, and the stereo matching model is slower on them (about 30-60 minutes on an Nvidia A40/L40 to process all the views), but not as slow as 17 hours... regardless, the dataset downloaded from the official MipNeRF360 site contains downsampled images which are smaller and the stereo network should work on them much faster. I will update the preprocessing file to run COLMAP on the downsampled images as well. Does your setup manage to run the example custom scan ("sculpture")?

yanivw12 commented 1 month ago

cuda 11.3 is too old, it would be great if it could be upgraded to cuda 11.8

Is gs2mesh/third_party /gaussian-splatting the original gaussian-splatting? Or has it been modified?

@gnulife I tried updating to CUDA 11.8 and python 3.10 and now my code is getting killed at the TSDF integration function as well. I'm looking into it.

yanivw12 commented 1 month ago

So far, it seems to happen on python>=3.9 (Got it working on 3.7 and3.8 with open3d 0.17.0, but not 3.9 and 3.10). What python versions are you using in the environment that didn't work and in the environment that did work? @scy5335 @gnulife

scy5335 commented 1 month ago

So far, it seems to happen on python>=3.9 (Got it working on 3.7 and3.8 with open3d 0.17.0, but not 3.9 and 3.10). What python versions are you using in the environment that didn't work and in the environment that did work? @scy5335 @gnulife

I use python 3.7 which is the same as that in the environment.yml

yanivw12 commented 1 month ago

I just pushed a new version with CUDA 11.8 and Python 3.8, and clarifications on how to avoid this bug. @guwinston I also added a downsampling argument to reduce the MipNeRF360 image size. Bicycle scene took less than 15 minutes on my server with downsampling.

I'm closing the issue for now, if anyone has more concerns please open a new issue as this one is getting crowded.