jorditorresBCN / LibroTensorFlow

Tutorial básico de uso de TensorFlow
88 stars 71 forks source link

it should be 8x8 not 7x7 #1

Open dexter4455 opened 8 years ago

dexter4455 commented 8 years ago

Hey Jordi,

Thanks for the great work. Correct me if i'm wrong (and if so please explain how you got to that number). But on page 114 and in code of cnn.py you write "The resulting output of the convolution has a dimension of 7x7 as we are applying the 5x5 window to a 12x12 space with a stride size of 1" - i assume (because it wasn't mentioned, that padding is 0 - it must be)

But its actually (12-5+0)+1=8 , which is 8x8 not 7x7

Thanks, waiting for your reply!

jorditorresBCN commented 8 years ago

I'm on travel at ML Summer School 2016. I will review it next weekend, sorry for any inconvenience. regards, Jori

jorditorresBCN commented 8 years ago

Hello derek4455,

I discussed this with one of the engineers in our research team at BSC-CNS, Maurici Yagües, and finally we aggreed the following answer. First of all, let me thanks Maurici for his help in this topic!

You are right in pointing out that the explanation and the code are not consistent. Your computation for the 8x8 layer is correct given the drawings on previous pages and the explanation. However, in the convolution function, the convolution layers are initialized with the parameter padding="SAME". This makes the size of the output equal to the size of the input by adding enough zero-padding, so no shrinking is done in the convolution layers.

The shrinking is done only on the pooling layers going from 28 in the input to 14, after h_pool1, and finally 7, after h_pool2.

In the explanation, and in the drawings, the case is described for padding="VALID", that is no zero-padding is done so the output dimensions follows the formula m - k + 1, where m is the input size and k is the kernel size, leading to the shrinking.

You can find a more clear explanation in "Chapter 9. Convolutional Networks" (specifically page 350) of the book http://www.deeplearningbook.org/.

Thanks for pointing it out, this will be better stated in future editions of the book.

dexter4455 commented 8 years ago

Thank you very much, that makes sense now!

jorditorresBCN commented 8 years ago

From: ShuhaoWang to@shuhao.wang In Chapter: 5. MULTI-LAYER NEURAL NETWORKS IN TENSORFLOW, we have indicated the padding mode is 'SAME', meaning the input and output sizes of CONV() should be the same. The padding size for the first CONV can be calculated to be 2, therefore, the output size of the first CONV is 28_28_32 (not 24_24_32). After the MAX_POOL, the size is 14_14_32 (not 12_12_32). Then the input and output sizes of the second CONV are 14_14_32 and 14_14_64, respectively (padding size = 2). Thus after the MAX_POOL, the output size is 7_7_64.

I think the padding mode in TensorFlow is pretty tricky. I hope you may explain more about it in your book.

Best regards, Yours sincerely, Shuhao

jorditorresBCN commented 8 years ago

From: Ricky Park:

In chapter 5, first convolution layer changes 28x28-->28x28-(pooling)->14x14 but not 28x28-->24x24-->12x12. Because the padding in the code is 'SAME'. So second convolution layer changes 14x14-->14x14-(pooling)->7x7 You can check tensorflow official tutorial and my jupyter notebook(https://github.com/rickiepark/tfk-notebooks/blob/master/first-contact-with-tensorflow/chapter5_convolution_neural_network.ipynb)