rishizek / tensorflow-deeplab-v3-plus

DeepLabv3+ built in TensorFlow
MIT License
833 stars 307 forks source link

Recommended GPU / GPU memory #20

Open blaskowitz100 opened 6 years ago

blaskowitz100 commented 6 years ago

Hello, can anyone share his/her experience what GPU (with how much memory) is at least needed to train a deeplab model based on this implementation? Thank you very much :)

Sam813 commented 6 years ago

I have tried the training on pascal voc with batch size of 2, it requires almost 16 GB memory on GPU and for batch size 3, 18.5 GB. with batch size= it requires almost 32 GB.

rishizek commented 6 years ago

Hi @blaskowitz100 , thank you for your interest in the repo.

In my case, I was able to train a model for PASCAL VOC dataset with batch size of 9 using GTX 1080Ti and 16 GB of main memory, and obtain decent result. If you have lower performance GPU, you can try reducing batch size and check if the model is trained. Regarding main memory, 16GB is enough, at least for my case.

I hope this answers your question.

JongMokKim commented 5 years ago

Hi @rishizek , can i ask you more details?

i'm trying to train deeplabv3+ but my batch size is too small my current setting and envrionment is input size - 512, 512, 3 backbone - xception, OS16 gpu - titan xp (12GB) --> batch size - 4 i think 4 is too small to train my own dataset,,

in your above experiment, what is your settings for backbone model and input size?

thank you for your help in advance!

AMArostegui commented 3 years ago

So I'm guessing, to train PASCAL VOC 2012 as shown in the script local_test.sh there are not many options regarding your graphics card. In the example, using the starting checkpoint for DeepLabv3 - xception65, doesn't matter how you play with the hyperparameters, not even for batch_size==1, you won't get results with a 6GB GPU.

Furthermore, I'm in the same situation when using the MobilenetV2 script local_test_mobilenetv2.sh and its starting checkpoint

NVIDIA lineup of cards of more than 8GB of dedicated GPU memory (as shared GPU memory is not used in Tensorflow) is pretty thin. Not even a GTX 1080 (8 GB) would do. You need to go for a GTX 1080 Ti (11GB), a Titan, or a RTX 2080 Ti (the regular RTX 2080 is only 8GB)

Update:

Trying to start the training process using the above examples result in several CUDA memory errors

tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED

and the program exiting with error code 1. I can see in the task manager how the memory is occupied to full capacity.

But after tinkering with the object GPUOptions, using the following lines

gpu_options = train.tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
sess = train.tf.Session(config=train.tf.ConfigProto(gpu_options=gpu_options))

the script throws several warnings regarding performance considerations

tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.93GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

but the training process starts succesfully.