phlippe / uvadlc_notebooks

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023
https://uvadlc-notebooks.readthedocs.io/en/latest/
MIT License
2.59k stars 590 forks source link

Just a question #75

Closed sweettyler closed 1 year ago

sweettyler commented 1 year ago

Tutorial: -5

Describe the bug In Kaiming's published paper On ResNet and many other implementations of "PreActResNetBlock", there is not an activation applied in "self.downsample".

I fully understand your structure works well (or even better in some cases), but I am not sure what's your consideration

To Reproduce (if any steps necessary) N/A

Expected behavior N/A

Screenshots image

Runtime environment (please complete the following information): N/A

Additional context Cell 3 in https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial5/Inception_ResNet_DenseNet.html

phlippe commented 1 year ago

Hi, this is indeed one of the small design choices that, on the scale of CIFAR10, do not really have a big impact in the final performance. The motivation of adding the non-linearity there was to further support the idea that the activations going into any block in the ResNet are unnormalized and not pushed through a non-linearity before, i.e. the activation has to happen before the weight block. You are absolutely right that this is not strictly needed in the downsample part and may sometimes be better without. It's probably best to adjust the comment in this line to point towards it being optional

sweettyler commented 1 year ago

@phlippe Thank you for your reply. I happened to do the same (due to a copy/paste mistake) and I found it worked, at least as same as if no better than without it. Glad to see your different implementation and no need to change anything.