GPU memory requirement - Githubissues

zzangjinsun / NLSPN_ECCV20

Park et al., Non-Local Spatial Propagation Network for Depth Completion, ECCV, 2020

MIT License

321 stars 55 forks source link

GPU memory requirement #21

Closed EthanZhangYi closed 3 years ago

EthanZhangYi commented 3 years ago

HI, I run the test command in Readme, but meet out of memory problem. My computer has 8 GPUs (TITAN X Pascal 12G). What is the min GPU memory requirement?

python main.py --dir_data PATH_TO_KITTI_DC --data_name KITTIDC --split_json ../data_json/kitti_dc.json \
    --patch_height 240 --patch_width 1216 --gpus 0,1,2,3 --max_depth 90.0 --num_sample 0 \
    --test_only --pretrain ../results/NLSPN_KITTI_DC.pt --preserve_input --save NAME_TO_SAVE

There is also another small question. Is --top_crop option only used for training?

zzangjinsun commented 3 years ago

If you run the code with full-size KITTI, it may require about 12~13 GB GPU memory. You can try with __--test_crop__ argument to crop the top region for the test set.

EthanZhangYi commented 3 years ago

@zzangjinsun Thanks for your reply. According to the paper, center-cropped 1216 240 patches with a batch size of 25 are used for training on KITTI DC and center-cropped 912 228 patches with a batch size of 12 are used for ablation study. What are the GPU requirements for these two setups?

As mentioned above, even testing with a batch size of 1 needs 12~13G memory, how to train the model with a batch size of 12 or 25? In README, the requirement is just 4x NVIDIA GTX 1080 TI / 4x NVIDIA Titan RTX GPUs

zzangjinsun commented 3 years ago

4x NVIDIA GTX 1080 TI / 4x NVIDIA Titan RTX GPUs are enough to train on NYUv2 or 912x228-sized KITTI dataset.

For the full-training, a powerful machine with 8 NVIDIA P40 GPUs is used. (Please refer to the Notes at the end of the README)

EthanZhangYi commented 3 years ago

@zzangjinsun Thanks for your reply. If I train models on KITTI dataset with 4x NVIDIA GTX 1080 TI GPUs using a crop size of 912x228, will the largest batch size be 12? Could you please help supply the trained model of this setting?

zzangjinsun commented 3 years ago

Yes. You should be able to train with 912x228 KITTI data and batch size 12. Please refer to the attached images.

Currently, there is no plan to release models trained with cropped images. Please use the full-trained model I released.

nlspn_command

Training parameters

nlspn_gpu

GPU usage

EthanZhangYi commented 3 years ago

@zzangjinsun Thanks for your instructions. For the setting above, I can only find its result on KITTI val set. The RMSE is 884.1 (Table 3 in the paper.) Canyou supply the full result (RMSE. MAE, iRMSE, iMAE)?

EthanZhangYi commented 3 years ago

@zzangjinsun On KITTI, I reproduced the result with RMSE=884.1 on val_selection_cropped set, but I can not reproduce the result of the released model with RMSE=771.8 on val_selection_cropped set. I followed the instructions in the 6.2 section of the paper.

Can you share all the arguments of this training setting? An image like the one above is appreciated. Thanks!

zzangjinsun commented 3 years ago

Please use the command provided in training with the following arguments: --epochs 25 --gamma 1.0,0.4,0.16,0.064 --decay 10,15,20,25 --batch_size AS_MANY_AS_YOU_CAN --preserve_input

(Note that --preserve_input is optional.)

EthanZhangYi commented 3 years ago

@zzangjinsun Thanks, When I enlarge the batch size, do I need to enlarge the learning rate correspondingly?

zzangjinsun commented 3 years ago

I think you can keep the original learning rate.