phlippe / uvadlc_notebooks

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023
https://uvadlc-notebooks.readthedocs.io/en/latest/
MIT License
2.59k stars 589 forks source link

[Question]Tutorial 5 (JAX) Max-pool branch for inception #131

Open sy-eng opened 11 months ago

sy-eng commented 11 months ago

Thank you for your great tutorials!

I have a question about codes in the cell 11 in Inception_ResNet_DenseNet.ipynb for JAX.

Max-pool branch looks like a 1x1 convolution branch, because the output of nn.max_pool() is not used.

x_max = nn.max_pool(x, (3, 3), strides=(2, 2)) x_max = nn.Conv(self.c_out["max"], kernel_size=(1, 1), kernel_init=googlenet_kernel_init, use_bias=False)(x)

I guess, here should be :

x_max = nn.max_pool(x, (3, 3), strides=(1, 1)) x_max = nn.Conv(self.c_out["max"], kernel_size=(1, 1), kernel_init=googlenet_kernel_init, use_bias=False)(x_max)

With strides = (2, 2), the feature size gets half of the original, so, the "strides" should be (1,1).

Thank you.

phlippe commented 10 months ago

Hi, thanks for pointing that out! This is indeed a typo and should have used a stride of 1 and x_max as input to the conv. I'll leave this issue open, since we'll need to retrain the models for fixing this. Thanks again :)

sy-eng commented 10 months ago

Thank you for your reply!

I also retrained it and found very little difference between the models with and without pooling layer. googleNet

skoohy commented 3 months ago

It seems that nn.maxpool() may have changed? The default argument for padding as of now is padding='VALID'. In the InceptionBlock class

x_max = nn.max_pool(x, (3, 3), strides=(1, 1))

Would also need to be changed to

x_max = nn.max_pool(x, (3, 3), strides=(1, 1), padding='SAME')

Because x_max.shape would be (128, 30, 30, 8) not (128, 32, 32, 8).

Also, within the GoogleNet class, the inception_blocks list contains nn.maxpool() which should be changed from

lambda inp: nn.max_pool(inp, (3, 3), strides=(2, 2))

to

lambda inp: nn.max_pool(inp, (3, 3), strides=(2, 2), padding="SAME")?

Without this the images were being reduced from (32, 32) to (15, 15) and then (15, 15) to (7, 7).