Closed PabloRR100 closed 5 years ago
My guess is that they're using a different block configuration:
Their DenseNet-100 uses a block configuration of (32, 16, and 8) (https://github.com/PabloRR100/Single_vs_Ensemble_of_NNs/blob/master/DenseNets/densenets_Efficientpy.py#L161), which is actually a 116-layer DenseNet. The standard configuration for a 100-layer Densenet, which you're presumably using, is (16, 16, 16).
Thanks @gpleiss I think I figured out the problem. Could you just confirm if the configuration blocks for CIFAR10 are constant for the different Dense Blocks? Being:
denseNetBC_100_12: (16,16,16) denseNetBC_250_24: (41, 41, 41) denseNetBC_190_40: (31,31,31)
This way I am getting the same number of parameters:
+-------------+-------------+-------+--------------+
| Model | Growth Rate | Depth | M. of Params |
+-------------+-------------+-------+--------------+
| DenseNet-BC | 12 | 100 | 0.769 |
+-------------+-------------+-------+--------------+
| DenseNet-BC | 24 | 250 | 15.324 |
+-------------+-------------+-------+--------------+
| DenseNet-BC | 40 | 190 | 25.624 |
+-------------+-------------+-------+--------------+
`
``
That matches my understanding of the original paper, yes.
Hi @gpleiss,
I was trying to train an ensemble of DenseNets_BC_100_12 in 2 GPU NVIDIA k80 when I encountered the memory efficient problem. However, I my research is sensible in terms of the number of parameters, and when I moved to this implementation they do not match any more.
In this implementation file you can see how the number of parameters exactly matches the ones reported:
However, in this other implementation following yours indications
Is there something else that need to be taken care and I am not seeing?
Thanks a lot in advance, Pablo