liuzhuang13 / DenseNet

Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award).
BSD 3-Clause "New" or "Revised" License
4.69k stars 1.06k forks source link

Memory efficient implementation of Caffe #23

Closed haikuoyao closed 6 years ago

haikuoyao commented 6 years ago

Hi, I saw this caffe implementation which is memory efficient. https://github.com/Tongcheng/DN_CaffeScript

And I also notice this in wiki

Memory efficient implementation (newly added feature on June 6, 2017)

There is an option -optMemory which is very useful for reducing GPU memory footprint when training a DenseNet. By default, the value is set to 2, which activates the shareGradInput function 
....

Does that caffe use the above memory efficient way to implementation?

Thanks.

Tongcheng commented 6 years ago

Hello @haikuoyao , caffe version does not because Caffe does not support enough computational graph style optimizations for shareMemory. However, my implementation avoids the "2-sidedness" of Caffe, which means within DenseBlock's data memory I didn't use Blob, instead they are pointers. Another issue is Caffe, in multi-gpu case, don't have good load-balance for memory across GPUs.

haikuoyao commented 6 years ago

Thank you, Tongcheng.

It works well on CIFAR data. While I tried to train on my own dataset, batch size has to be set 4. If I set it bigger, I got a out of memory error.
I also tried https://github.com/liuzhuang13/DenseNetCaffe yesterday. It's the same. I wonder should I change network to adapt my dataset? Thanks a million.

liuzhuang13 commented 6 years ago

Hello @haikuoyao,

what is your input size? CIFAR image is 32x32. If your image is much larger, you may want to do a downsampling through a conv with stride 2, before feeding the image into the first dense block, to reduce the memory consumption.

haikuoyao commented 6 years ago

Thanks @liuzhuang13 . yeah, you are right. My images are 224 * 224 which is used for Resnet. Thanks a lot. It's so helpful. Gonna close this issue.