liuzhuang13 / DenseNet

Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award).
BSD 3-Clause "New" or "Revised" License
4.71k stars 1.07k forks source link

The amount of parameters #14

Open shamangary opened 7 years ago

shamangary commented 7 years ago

I use the following setting, as suggested in the github. L=40,k=12, no bottleneck However, the parameter number is not 1M, it's 0.6M. This problem also happen when I turn bottelneck on. I got different parameter number than the reported one. Please tell me where do I miss. Thank you.

Calling the model:

dn_opt = {}
dn_opt.depth = 40
dn_opt.dataset = 'cifar10'
model = paths.dofile('densenet.lua')(dn_opt)
model:cuda()
print(model:getParameters():size())

In densenet.lua

local growthRate = 12

    --dropout rate, set it to 0 to disable dropout, non-zero number to enable dropout and set drop rate
    local dropRate = 0

    --#channels before entering the first denseblock
    local nChannels = 2 * growthRate

    --compression rate at transition layers
    local reduction = 0.5

    --whether to use bottleneck structures
    local bottleneck = false

Output of the parameter size

599050
[torch.LongStorage of size 1]
liuzhuang13 commented 7 years ago

Hi! "BC" stands for bottleneck(B) and compression(C). This is explained at the "compression" paragraph at section 3 of the paper. To use a original DenseNet, you need to also set the variable "reduction" to 1 in the code.

shamangary commented 7 years ago

Thank you very much. It matched now.

shamangary commented 7 years ago

On the otherhand, the amount of parameters of DenseNet is small indeed, but the GPU memory will still be consumed by the complex structure instead of the parameters.

By using the 8GB GPU, I was able to run 11M parameters WRN. However, I cannot run 0.8M parameters DenseNet-BC(L=100,k=12) since out-of-memory problem. This might be caused by a lot of feature maps are stored during training.

liuzhuang13 commented 7 years ago

Thanks for pointing out. I've just found other people discussing this, and wrote a comment on reddit here https://www.reddit.com/r/MachineLearning/comments/67fds7/d_how_does_densenet_compare_to_resnet_and/?utm_content=title&utm_medium=hot&utm_source=reddit&utm_name=MachineLearning

My suggestion is that trying a shallow and wide densenet, by setting depth smaller and growthRate larger.

Tongcheng commented 7 years ago

Hello @shamangary , regarding the memory cost of feature maps, currently we have a Caffe implementation which trys to address the memory hungry problem (listed under much more spatial efficient caffe implementation), the DenseNet-BC (L=100,k=12) should take no more than 2.5 GB when running with test on, about 1.7 GB when running without test mode. (Caffe seems to allocate separate spaces for testing.) Hope that would help!

shamangary commented 7 years ago

OK. Thanks! Despite I wish Torch can also have such property. (QAQ)