gpleiss / efficient_densenet_pytorch

A memory-efficient implementation of DenseNets
MIT License
1.51k stars 329 forks source link

Number of parameters doesn't match with naïve implementation #46

Closed PabloRR100 closed 5 years ago

PabloRR100 commented 5 years ago

Hi @gpleiss,

I was trying to train an ensemble of DenseNets_BC_100_12 in 2 GPU NVIDIA k80 when I encountered the memory efficient problem. However, I my research is sensible in terms of the number of parameters, and when I moved to this implementation they do not match any more.

In this implementation file you can see how the number of parameters exactly matches the ones reported:

    +-------------+-------------+-------+--------------+
    |    Model    | Growth Rate | Depth | M. of Params |
    +-------------+-------------+-------+--------------+
    |  DenseNet   |     12      |  40   |     1.02     |
    +-------------+-------------+-------+--------------+
    |  DenseNet   |     12      |  100  |     6.98     |
    +-------------+-------------+-------+--------------+
    |  DenseNet   |     24      |  100  |    27.249    |
    +-------------+-------------+-------+--------------+
    | DenseNet-BC |     12      |  100  |    0.769     |
    +-------------+-------------+-------+--------------+
    | DenseNet-BC |     24      |  250  |    15.324    |
    +-------------+-------------+-------+--------------+
    | DenseNet-BC |     40      |  190  |    25.624    |
    +-------------+-------------+-------+--------------+

However, in this other implementation following yours indications

    +-------------+-------------+-------+--------------+
    |    Model    | Growth Rate | Depth | M. of Params |
    +-------------+-------------+-------+--------------+
    | DenseNet-BC |     12      |  100  |    1.108     |
    +-------------+-------------+-------+--------------+
    | DenseNet-BC |     24      |  250  |    4.275     |
    +-------------+-------------+-------+--------------+
    | DenseNet-BC |     40      |  190  |     11.7     |
    +-------------+-------------+-------+--------------+

Is there something else that need to be taken care and I am not seeing?

Thanks a lot in advance, Pablo

gpleiss commented 5 years ago

My guess is that they're using a different block configuration:

Their DenseNet-100 uses a block configuration of (32, 16, and 8) (https://github.com/PabloRR100/Single_vs_Ensemble_of_NNs/blob/master/DenseNets/densenets_Efficientpy.py#L161), which is actually a 116-layer DenseNet. The standard configuration for a 100-layer Densenet, which you're presumably using, is (16, 16, 16).

PabloRR100 commented 5 years ago

Thanks @gpleiss I think I figured out the problem. Could you just confirm if the configuration blocks for CIFAR10 are constant for the different Dense Blocks? Being:

denseNetBC_100_12: (16,16,16) denseNetBC_250_24: (41, 41, 41) denseNetBC_190_40: (31,31,31)

This way I am getting the same number of parameters:


+-------------+-------------+-------+--------------+
|    Model    | Growth Rate | Depth | M. of Params |
+-------------+-------------+-------+--------------+
| DenseNet-BC |     12      |  100  |    0.769     |
+-------------+-------------+-------+--------------+
| DenseNet-BC |     24      |  250  |    15.324    |
+-------------+-------------+-------+--------------+
| DenseNet-BC |     40      |  190  |    25.624    |
+-------------+-------------+-------+--------------+
`
``
gpleiss commented 5 years ago

That matches my understanding of the original paper, yes.