Closed sweettyler closed 1 year ago
Hi, this is indeed one of the small design choices that, on the scale of CIFAR10, do not really have a big impact in the final performance. The motivation of adding the non-linearity there was to further support the idea that the activations going into any block in the ResNet are unnormalized and not pushed through a non-linearity before, i.e. the activation has to happen before the weight block. You are absolutely right that this is not strictly needed in the downsample part and may sometimes be better without. It's probably best to adjust the comment in this line to point towards it being optional
@phlippe Thank you for your reply. I happened to do the same (due to a copy/paste mistake) and I found it worked, at least as same as if no better than without it. Glad to see your different implementation and no need to change anything.
Tutorial: -5
Describe the bug In Kaiming's published paper On ResNet and many other implementations of "PreActResNetBlock", there is not an activation applied in "self.downsample".
I fully understand your structure works well (or even better in some cases), but I am not sure what's your consideration
To Reproduce (if any steps necessary) N/A
Expected behavior N/A
Screenshots
Runtime environment (please complete the following information): N/A
Additional context Cell 3 in https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial5/Inception_ResNet_DenseNet.html