Open Bear-kai opened 4 years ago
Ok, I find that the DropConnect (ICML 2013) is a generalization of Dropout. Like Dropout, the technique is suitable for fully connected layers only.
The EfficientNet (ICML2019) paper said stochastic depth (ECCV 2016) with drop connect ratio 0.2 is used for training.
Obviously, the two "drop connect" above are totally different things! I think the implementation is confused.
Thank you for the clarification since I came here with similar questions. I still have some doubts with regard to the implementation. Specifically, I was expecting to see outputs being dropped out according to the binary_tensor
(with multiplication):
output = inputs * binary_tensor
However I see that apart from the dropping (implemented with binary_tensor
) there is also a scaling factor tensor (keep_prob
) applied to the inputs:
output = inputs / keep_prob * binary_tensor
Why is that? I know there are kinds of Gaussian Dropouts where inputs are scaled up or down according to Normal distribution... but in here I'm not quite sure why.
I see that by doing inputs / keep_prob
you are actually scaling up inputs by 1/(1-p)
or p/(p-1)
, is this some sort of regularization to ensure the mean of the output values are the same as the inputs values before dropping?
The parameter drop_connect_rate is used for stochastic depth of the network, but the function drop_connect() seems to drop samples ?
Stochastic Depth should drop the building block randomly. I don't understand the above function. Anyone can help ?!