omoindrot / papers

Summaries of the papers I read
1 stars 0 forks source link

Rethinking the Inception Architecture for Computer Vision #5

Open omoindrot opened 8 years ago

omoindrot commented 8 years ago

3rd paper on the Inception architecture from Google

Objective

Understand the principles which guided the creation of the Inception architecture and allowed to scale the model. Currently, it is very difficult to change the model because it is difficult to understand what is important and it, and what can be changed.

Insights

  1. the information flows from 299x299x3 dimensions in the input, to 1024 before the input. We have to decrease this representation size gently in the model, to avoid any representational bottleneck
  2. ? I think they say that when we are deep in the network, we should use activations maps with a lot of features (say >1024) to disentangle the features and learn faster. We should also provide a variety of features to produce high dimensional sparse output. In Inception, they use it in the last 8x8 feature map. 8x8 feature map
  3. When using 3x3 or 5x5 convolutions, it works as well when adding a 1x1 bottleneck layer before which reduces the dimensionality. The result is the same because of the correlation between adjacent units bottleneck
  4. When we want to add parameters, we should increase both depth and width of the network
  5. Factorize 5x5 or 7x7 convolutions in multiple 3x3 convolutions

factorization

  1. Factorize 3x3 convolutions with 1x3 and 3x1 convolutions. This works well in medium grid sizes (between 12x12 and 20x20 feature maps), especially with 1x7 followed by 7x1. In Inception they used this in the 18x18 feature map.

18x18 features map

  1. When reducing the size of the feature map (with pooling), we have to be careful not to introduce a representational bottleneck. To avoid big computations, Inception proposes to use both pooling (cheap, but loss of representation) and convolution (costly, but keep representation intact). downsampling

    Model Regularization via Label Smoothing

Idea for training the network: instead of using the cross entropy with the true distribution (1 for the true label, 0 elsewhere), use a combination of the true distribution and a uniform distribution: capture d ecran 2016-06-11 a 22 51 39

K is the number of examples, epsilon is 0.1

Inception v2

capture d ecran 2016-06-11 a 22 09 07

hamid54 commented 8 years ago

hi, In Inception v2 the input size is 229*229. Is it the same in V3?

Thank you!

omoindrot commented 8 years ago

Yes Inception v3 uses an input size of 299*299 (not 229).

You can refer to the TensorFlow implementation of the inception v3 model.

hamid54 commented 8 years ago

Thank you! 229 was a typo, sorry!

so, you may correct the first insight which says the input size is 224x224x3.