hjxwhy / mipnerf_pl

Unofficial pytorch-lightning implement of Mip-NeRF
MIT License
274 stars 30 forks source link

Why it is so slow #11

Closed StarsTesla closed 2 years ago

StarsTesla commented 2 years ago

I start training with the dev code on the dataset of lego, multi-scale, 3 3090 GPU, and one epoch takes over 4 or 5 hours, wtf?

StarsTesla commented 2 years ago

Here's log:

python train.py --out_dir out-lego --data_path ./data-blender/lego --dataset_name multi_blender exp_name lego num_gpus 3 train.num_work 48 GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/3 initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/3 initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/3

distributed_backend=nccl All distributed processes registered. Starting with 3 processes

LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2] LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2] Validation sanity check: 0it [00:00, ?it/s]/root/anaconda3/envs/mipnerf/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:110: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 48 which is the number of cpus on this machine) in theDataLoader` init to improve performance. rank_zero_warn( Epoch 0: 1%|▍ | 99/9240 [02:36<4:00:32, 1.58s/it, loss=0.296, v_num=2, train/psnr=10.80]

StarsTesla commented 2 years ago

I tried branch master, it gives:

python train.py --out_dir result --data_path data/lego --dataset_name multi_blender exp_name lego num_gpus 3 GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/3 initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/3 initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/3

distributed_backend=nccl All distributed processes registered. Starting with 3 processes

LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2] LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2] Validation sanity check: 0it [00:00, ?it/s]/root/anaconda3/envs/mipnerf/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:110: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 48 which is the number of cpus on this machine) in theDataLoader` init to improve performance. rank_zero_warn( Epoch 0: 0%|▏ | 34/9236 [00:55<4:09:20, 1.63s/it, loss=0.413, v_num=1, train/psnr=9.210]

StarsTesla commented 2 years ago

And for single GPU for branch master, it gives:

python train.py --out_dir result --data_path data/lego --dataset_name multi_blender exp_name lego GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2] Validation sanity check: 0it [00:00, ?it/s]/root/anaconda3/envs/mipnerf/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:110: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 48 which is the number of cpus on this machine) in theDataLoader` init to improve performance. rank_zero_warn( Epoch 0: 0%| | 58/27682 [00:24<3:18:03, 2.32it/s, loss=0.368, v_num=2, train/psnr=9.820]

StarsTesla commented 2 years ago

conclusion: single gpu give a much faster speed of training?

StarsTesla commented 2 years ago

results: after almost 20 hours, 8 epochs on 3090, it achieve pnsr around 34-35, and it seems some problem on my GPUs, sorry about that.