initialneil / SplattingAvatar

[CVPR2024] Official implementation of SplattingAvatar.
Other
336 stars 32 forks source link

Got CUDA error when using GPU to train #5

Closed wangyichen191 closed 3 months ago

wangyichen191 commented 3 months ago

Hello, appreciate for your code and excellent work. I got some errors when run the code. cmd: CUDA_VISIBLE_DEVICES=3 python train_splatting_avatar.py --config configs/splatting_avatar.yaml --dat_dir /data2/wangyichen/SplattingAvatar/data/flame/yufeng I changed data_device in splatting_avatar.yaml to cuda and got the following error:

creating the FLAME Decoder [IMavatarDataset][train] num_frames = 2016 creating the FLAME Decoder [IMavatarDataset][test] num_frames = 350 [Triwalk] init mesh with F(9976, 3) [TriangleWalk] init edge table ...[done] [TriangleWalk] init triangle neighbor ...[done] Number of points at initialisation : 10000 [SplattingAvatarOptim] optim_xyz, lr=0.00016 [SplattingAvatarOptim] optim_features, lr=0.0025 [SplattingAvatarOptim] optim_opacity, lr=0.05 [SplattingAvatarOptim] optim_scaling, lr=0.005 [SplattingAvatarOptim] optim_rotation, lr=0.001 0%| | 0/30000 [00:00<?, ?it/s] Traceback (most recent call last): File "/data2/wangyichen/SplattingAvatar/train_splatting_avatar.py", line 76, in batches = next(data_iterator) File "/data2/wangyichen/anaconda3/envs/splatting-avatar/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 628, in next data = self._next_data() File "/data2/wangyichen/anaconda3/envs/splatting-avatar/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data return self._process_data(data) File "/data2/wangyichen/anaconda3/envs/splatting-avatar/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data data.reraise() File "/data2/wangyichen/anaconda3/envs/splatting-avatar/lib/python3.9/site-packages/torch/_utils.py", line 543, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/data2/wangyichen/anaconda3/envs/splatting-avatar/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/data2/wangyichen/anaconda3/envs/splatting-avatar/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/data2/wangyichen/anaconda3/envs/splatting-avatar/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/data2/wangyichen/SplattingAvatar/dataset/imavatar_data.py", line 133, in getitem scene_cameras = convert_to_scene_cameras(color_frames, self.config) File "/data2/wangyichen/SplattingAvatar/scene/dataset_readers.py", line 355, in convert_to_scene_cameras camera = make_scene_camera(idx, color_frames.cams[idx], img, image_path, config) File "/data2/wangyichen/SplattingAvatar/scene/dataset_readers.py", line 347, in make_scene_camera scene_camera = loadCam(config, idx, cam_info, resolution_scale) File "/data2/wangyichen/SplattingAvatar/utils/camera_utils.py", line 72, in loadCam return Camera(colmap_id=cam_info.uid, R=cam_info.R, T=cam_info.T, File "/data2/wangyichen/SplattingAvatar/scene/cameras.py", line 39, in init self.original_image = image.clamp(0.0, 1.0).to(self.data_device) File "/data2/wangyichen/anaconda3/envs/splatting-avatar/lib/python3.9/site-packages/torch/cuda/init.py", line 217, in _lazy_init raise RuntimeError( RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Looking forward to your help! Thanks!

initialneil commented 3 months ago

dataloader which involves multithreading is not friendly with cuda. I think to keep data in cpu and only upload to cuda when needed is efficient enough for this little experiment.