Open cr7Por opened 9 months ago
my cuda version is 11.7,is this a problem?
Hi, could you get value from this ?
import torch
torch.version.cuda
and could you tell me what is your GPU model ?
import torch torch.version.cuda '11.6'
I am using rtx3090.
Could you add CUDA_LAUNCH_BLOCKING=1 before excuting the script like
CUDA_LAUNCH_BLOCKING=1 python ...
CUDA_LAUNCH_BLOCKING=1 python train.py --quiet --eval --config configs/n3d_lite/cut_roasted_beef.json --model_path log/cut_beef --source_path cut_roasted_beef/colmap_0
return runtimeerror immediately, no time for gpu memory to grow.
File "train.py", line 69, in train scene = Scene(dataset, gaussians, duration=duration, loader=dataset.loader) File "/home/ubuntu/liudong/SpacetimeGaussians/thirdparty/gaussian_splatting/scene/init.py", line 99, in init self.train_cameras[resolution_scale] = cameraList_from_camInfosv2(scene_info.train_cameras, resolution_scale, args) File "./thirdparty/gaussian_splatting/utils/camera_utils.py", line 260, in cameraList_from_camInfosv2 camera_list.append(loadCamv2(args, id, c, resolution_scale)) File "./thirdparty/gaussian_splatting/utils/camera_utils.py", line 109, in loadCamv2 image_name=cam_info.image_name, uid=id, data_device=args.data_device, near=cam_info.near, far=cam_info.far, timestamp=cam_info.timestamp, rayo=rays_o, rayd=rays_d,cxr=cam_info.cxr,cyr=cam_info.cyr) File "./thirdparty/gaussian_splatting/scene/cameras.py", line 53, in init self.original_image = image.clamp(0.0, 1.0).to(self.data_device) RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable
Could you try solution in this post ? https://discuss.pytorch.org/t/distributeddataparallel-runtimeerror-cuda-error-all-cuda-capable-devices-are-busy-or-unavailable/102763/5
SpacetimeGaussians$ nvidia-smi -i 0 -c 0 Compute mode is already set to DEFAULT for GPU 00000000:01:00.0. All done.
CUDA_LAUNCH_BLOCKING=1 python train.py --quiet --eval --config configs/n3d_lite/cut_roasted_beef.json --model_path log/cut_beef --source_path cut_roasted_beef/colmap_0
still same runtime error.
can you get your
torch.__version__
?
Python 3.7.13 (default, Oct 18 2022, 18:57:03) [GCC 11.2.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.
import torch torch.version '1.12.1+cu116'
How about this “CUDA_LAUNCH_BLOCKING=1 CUDA_VISIBLE_DEVICES=0 python train.py ... ”
you can choose different CUDA_VISIBLE_DEVICES from 0 to 1 to 5,...
CUDA_LAUNCH_BLOCKING=1 CUDA_VISIBLE_DEVICES=0 python train.py --quiet --eval --config configs/n3d_lite/cut_roasted_beef.json --model_path log/cut_beef --source_path cut_roasted_beef/colmap_0
still same runtime error, I only have one rtx3090 in my system.
what is output of your nvidia-smi ?
Every 2.0s: nvidia-smi ubuntu: Tue Jan 2 13:28:28 2024
Tue Jan 2 13:28:28 2024 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.116.04 Driver Version: 525.116.04 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | 0% 40C P8 26W / 370W | 70MiB / 24576MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1176 G /usr/lib/xorg/Xorg 56MiB | | 0 N/A N/A 1385 G /usr/bin/gnome-shell 11MiB | +-----------------------------------------------------------------------------+
runtime error return immediately, gpu memory usage is not changed.
According to 3D Guassian's suggestion for their code and your cuda driver is 12.0 so i think the you can install Python 3.8, PyTorch 2.0.0, CUDA 12
If you can afford the disk space, we recommend using our environment files for setting up a training environment identical to ours. If you want to make modifications, please note that major version changes might affect the results of our method. However, our (limited) experiments suggest that the codebase works just fine inside a more up-to-date environment (Python 3.8, PyTorch 2.0.0, CUDA 12). Make sure to create an environment where PyTorch and its CUDA runtime version match and the installed CUDA SDK has no major version difference with PyTorch's CUDA version.
https://github.com/graphdeco-inria/gaussian-splatting
ok, i will give it a try. thank you very much.
Python 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.
import torch torch.version <module 'torch.version' from '/home/ubuntu/anaconda3/envs/colmapenv/lib/python3.8/site-packages/torch/version.py'> torch.version '2.1.2+cu121'
same runtime error here.
scene/cameras.py:53 in │
│ init │
│ │
│ 50 │ │ # image is real image │
│ 51 │ │ if not isinstance(image, tuple): │
│ 52 │ │ │ if "camera_" not in image_name: │
│ ❱ 53 │ │ │ │ self.original_image = image.clamp(0.0, 1.0).to(self.data_device) │
│ 54 │ │ │ else: │
│ 55 │ │ │ │ self.original_image = image.clamp(0.0, 1.0).half().to(self.data_device) │
│ 56 │ │ │ self.image_width = self.original_image.shape[2] │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
how about just
import torch
torch.ones((1, 1)).to('cuda')
conda activate feature_splatting (feature_splatting) ubuntu@ubuntu:~$ python Python 3.7.13 (default, Oct 18 2022, 18:57:03) [GCC 11.2.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.
import torch torch.ones((1, 1)).to('cuda') tensor([[1.]], device='cuda:0')
that is ok.
how about replace self.data_device with 'cuda' in scene/cameras.py:53 in ?
File "./thirdparty/gaussian_splatting/utils/camera_utils.py", line 260, in cameraList_from_camInfosv2 camera_list.append(loadCamv2(args, id, c, resolution_scale)) File "./thirdparty/gaussian_splatting/utils/camera_utils.py", line 109, in loadCamv2 image_name=cam_info.image_name, uid=id, data_device=args.data_device, near=cam_info.near, far=cam_info.far, timestamp=cam_info.timestamp, rayo=rays_o, rayd=rays_d,cxr=cam_info.cxr,cyr=cam_info.cyr) File "./thirdparty/gaussian_splatting/scene/cameras.py", line 53, in init self.original_image = image.clamp(0.0, 1.0).to('cuda') RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable same error
File "./thirdparty/gaussian_splatting/utils/camera_utils.py", line 260, in cameraList_from_camInfosv2 camera_list.append(loadCamv2(args, id, c, resolution_scale)) File "./thirdparty/gaussian_splatting/utils/camera_utils.py", line 109, in loadCamv2 image_name=cam_info.image_name, uid=id, data_device=args.data_device, near=cam_info.near, far=cam_info.far, timestamp=cam_info.timestamp, rayo=rays_o, rayd=rays_d,cxr=cam_info.cxr,cyr=cam_info.cyr) File "./thirdparty/gaussian_splatting/scene/cameras.py", line 53, in init self.original_image = image.clamp(0.0, 1.0).to('cuda') RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable same error
Could you print the dtype of image.clamp(0.0, 1.0) ?
torch.float32 [03/01 16:17:10]
same problem, solved by changing terminal @~@
I have encountered exactly the same problem. And I fix it by decreasing the "duration" value in the config file. I guess the problem arised from something like CUDA_OUT_OF_MEMORY. Hope this can be helpful.
File "./thirdparty/gaussian_splatting/scene/cameras.py", line 53, in init self.original_image = image.clamp(0.0, 1.0).to(self.data_device) RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.