Using the tutorial in first nerf, unable to initialize a training by using the command:
# 1 GPU (8192 rays per GPU per batch) export CUDA_VISIBLE_DEVICES=0 ns-train nerfacto-big --vis viewer+wandb --machine.num-gpus 1 --pipeline.datamanager.train-num-rays-per-batch 4096 --data data/nerfstudio/aspen
or
# 2 GPUs (4096 rays per GPU per batch, effectively 8192 rays per batch) export CUDA_VISIBLE_DEVICES=0,1 ns-train nerfacto --vis viewer+wandb --machine.num-gpus 2 --pipeline.datamanager.train-num-rays-per-batch 4096 --data data/nerfstudio/aspen
got this error:
but it still stop after dataloader, viewer won't show
Saving config to: outputs/poster/nerfacto/2023-06-25_012759/config.yml experiment_config.py:134
[01:28:02] Saving checkpoints to: outputs/poster/nerfacto/2023-06-25_012759/nerfstudio_models trainer.py:135
[01:28:02] Saving checkpoints to: outputs/poster/nerfacto/2023-06-25_012759/nerfstudio_models trainer.py:135
Auto image downscale factor of 2 nerfstudio_dataparser.py:324
Auto image downscale factor of 2 nerfstudio_dataparser.py:324
Setting up training dataset...
Caching all 204 images.
Loading data batch ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--Setting up training dataset...
Caching all 204 images.
Setting up evaluation dataset...
Caching all 22 images.
Loading data batch ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--Setting up evaluation dataset...
Caching all 22 images.
Yes seems like the option was changed to num-devices. After the dataloader it does take a while to initialize. If you check cpu usage the process should be running at 100% then just wait.
Describe the bug
GPU: NVIDIA RTX A6000
Using the tutorial in first nerf, unable to initialize a training by using the command:
# 1 GPU (8192 rays per GPU per batch) export CUDA_VISIBLE_DEVICES=0 ns-train nerfacto-big --vis viewer+wandb --machine.num-gpus 1 --pipeline.datamanager.train-num-rays-per-batch 4096 --data data/nerfstudio/aspen
or# 2 GPUs (4096 rays per GPU per batch, effectively 8192 rays per batch) export CUDA_VISIBLE_DEVICES=0,1 ns-train nerfacto --vis viewer+wandb --machine.num-gpus 2 --pipeline.datamanager.train-num-rays-per-batch 4096 --data data/nerfstudio/aspen
got this error:To Reproduce Build nightly version by building from source:
then goes to the multi-gpu tutorial part
Possible solution changed
--machine.num-gpus
into--machine.num-devices
might workbut it still stop after dataloader, viewer won't show