jolibrain / deepdetect

Deep Learning API and Server in C++14 support for Caffe, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE
https://www.deepdetect.com/
Other
2.52k stars 561 forks source link

Discussion on convergence and memory requirements using ResNet #84

Open anguoyang opened 8 years ago

anguoyang commented 8 years ago

Hi, I found a description on the dede website that resnet_50 need 6G+ memory to run, is it a "MUST" requirement or just a "PREFER" one? thanks.

beniz commented 8 years ago

Thanks for bringing this up. I believe that this is true for training, but not for prediction, so I'll get the info on the website corrected accordingly.

At the moment, doing a quick image classification test on a single image prediction task, the resnet_50 takes 302MB out of 12GB on a K40, as reported by nvidia-smi.

If you wish to train a model, there's some more info regarding resnets in #60.

revilokeb commented 8 years ago

Hi @beniz and @anguoyang, if the network architecture is fixed (resnet_50, resnet_101,...) then batch_size is the important variable that determines if the algo can be run on a GPU with given RAM.

That means for quite a few prediction setups the batch size could be even set to as small as 1, thus reducing necessary GPU RAM to relatively small amounts as pointed out by Emmanuel.

Training a network is another matter. However according to my experience this also very much depends on the task. For example I have been able to finetune resnet_50, resnet_101 and even resnet_152 on certain classification tasks with batch sizes as low as 4 or 8 on a single GPU, which requires GPU RAM for resnet_50 of less than 4GB (batch_size=8, according to nvidia-smi), for resnet_101 of less than 6GB (batch_size=8) and for resnet_152 a little more than 5GB (batch_size=4). Classification error was low throughout those experiments. But of course my task was much, much simpler than training ImageNet from scratch, which I do not think is possible that way. All I would like to point out is that depending on the complexity of the task (for finetuning sometimes I set the learning rate of all lower level layers to small numbers or even zero) sometimes relatively small batch sizes can be afforded allowing to use transfer learning / finetuning on the really ultra-deep nets with moderately large (single) GPUs.

As a consequence even with limited GPU ressources (such as single 4Gb or 6GB GPU) it really is sometimes possible to use the high-quality ultra-deep nets to learn interesting task and later publish them for prediction on moderately expensive Amazon 4GB GPUs.

beniz commented 8 years ago

Thanks for all the detailed info. FTR, some people have reported difficulty with convergence, see https://github.com/KaimingHe/deep-residual-networks/issues/6

ghost commented 8 years ago

Hi @beniz

I just find your quick analysis on the memory usage. Is the GPU memory depends on the input image size? For example, when I use resnet_50 and resize the input image to around 1200 x 4000, out of memory occurs. But when downsize the image to around 900 x 3000, it works.

I hope you can provide another quick analysis about the relationship btw image size and memory (fix the batchsize to a small constant).

beniz commented 8 years ago

The ResNets are fully convolutional, i.e. any size above the initial 224x224 training size works. Of course the memory requirement increases with size. I'd expect a square increase or even above due to the increased number of feature maps.

ghost commented 8 years ago

Thank you.

freeyawork commented 7 years ago

@beniz if input image size larger than 224*224, the units of output of flatten layer will increase. I think that's the reason of memory requirement increases with size.

youye115 commented 7 years ago

I trained faster-rcnn-resnet50 well , but when I use the trained model to predict on same machine , check failed " out of memory", Anyone knows why? [Uploading test1.txt…]()