keras-team / keras-applications

Reference implementations of popular deep learning models.
Other
2k stars 914 forks source link

InceptionResnetV2 summary() seems to be another network as it does not look like the one in the paper. #196

Open hamddan4 opened 3 years ago

hamddan4 commented 3 years ago

Summary

Importing the model of InceptionResnetV2 seems to be importing another model instead.

Environment

Logs or source codes for reproduction

If you do:

import keras
mod = keras.applications.InceptionResNetV2()
mod.summary()

The first lines of the model output look something like this:

Model: "inception_resnet_v2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            (None, 299, 299, 3)  0
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 149, 149, 32) 864         input_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 149, 149, 32) 96          conv2d_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 149, 149, 32) 0           batch_normalization_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 147, 147, 32) 9216        activation_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 147, 147, 32) 96          conv2d_2[0][0]
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 147, 147, 32) 0           batch_normalization_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 147, 147, 64) 18432       activation_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 147, 147, 64) 192         conv2d_3[0][0]
__________________________________________________________________________________________________
activation_3 (Activation)       (None, 147, 147, 64) 0           batch_normalization_3[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 73, 73, 64)   0           activation_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 73, 73, 80)   5120        max_pooling2d_1[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 73, 73, 80)   240         conv2d_4[0][0]
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 73, 73, 80)   0           batch_normalization_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 71, 71, 192)  138240      activation_4[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 71, 71, 192)  576         conv2d_5[0][0]
__________________________________________________________________________________________________
activation_5 (Activation)       (None, 71, 71, 192)  0           batch_normalization_5[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 35, 35, 192)  0           activation_5[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 35, 35, 64)   12288       max_pooling2d_2[0][0]
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 35, 35, 64)   192         conv2d_9[0][0]
__________________________________________________________________________________________________
activation_9 (Activation)       (None, 35, 35, 64)   0           batch_normalization_9[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 35, 35, 48)   9216        max_pooling2d_2[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 35, 35, 96)   55296       activation_9[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 35, 35, 48)   144         conv2d_7[0][0]
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 35, 35, 96)   288         conv2d_10[0][0]
__________________________________________________________________________________________________
activation_7 (Activation)       (None, 35, 35, 48)   0           batch_normalization_7[0][0]
__________________________________________________________________________________________________
activation_10 (Activation)      (None, 35, 35, 96)   0           batch_normalization_10[0][0]
__________________________________________________________________________________________________
average_pooling2d_1 (AveragePoo (None, 35, 35, 192)  0           max_pooling2d_2[0][0]
__________________________________________________________________________________________________
.
.
.
and many more lines

If you take a look, you first see that these layers are put sequentially, and there's no filter_concatenation done. As the original paper say, the STEM block for InceptionResnetV2 doesn't look like the one above, and the one above look more like the STEM block for InceptionResnetV1. Here below are the two STEM blocks for both architectures:

image image

I've found an implementation for InceptionV4, and it does have the STEM block for InceptionV4 and InceptionResnetV2 well specified. The output for their lines are:

Model: "inception_v4"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            (None, 512, 512, 3)  0
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 255, 255, 32) 864         input_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 255, 255, 32) 96          conv2d_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 255, 255, 32) 0           batch_normalization_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 253, 253, 32) 9216        activation_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 253, 253, 32) 96          conv2d_2[0][0]
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 253, 253, 32) 0           batch_normalization_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 253, 253, 64) 18432       activation_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 253, 253, 64) 192         conv2d_3[0][0]
__________________________________________________________________________________________________
activation_3 (Activation)       (None, 253, 253, 64) 0           batch_normalization_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 126, 126, 96) 55296       activation_3[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 126, 126, 96) 288         conv2d_4[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 126, 126, 64) 0           activation_3[0][0]
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 126, 126, 96) 0           batch_normalization_4[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 126, 126, 160 0           max_pooling2d_1[0][0]
                                                                 activation_4[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 126, 126, 64) 10240       concatenate_1[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 126, 126, 64) 192         conv2d_7[0][0]
__________________________________________________________________________________________________
activation_7 (Activation)       (None, 126, 126, 64) 0           batch_normalization_7[0][0]
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 126, 126, 64) 28672       activation_7[0][0]
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 126, 126, 64) 192         conv2d_8[0][0]
__________________________________________________________________________________________________
activation_8 (Activation)       (None, 126, 126, 64) 0           batch_normalization_8[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 126, 126, 64) 10240       concatenate_1[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 126, 126, 64) 28672       activation_8[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 126, 126, 64) 192         conv2d_5[0][0]
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 126, 126, 64) 192         conv2d_9[0][0]
__________________________________________________________________________________________________
activation_5 (Activation)       (None, 126, 126, 64) 0           batch_normalization_5[0][0]
__________________________________________________________________________________________________
activation_9 (Activation)       (None, 126, 126, 64) 0           batch_normalization_9[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D)               (None, 124, 124, 96) 55296       activation_5[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 124, 124, 96) 55296       activation_9[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 124, 124, 96) 288         conv2d_6[0][0]
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 124, 124, 96) 288         conv2d_10[0][0]
__________________________________________________________________________________________________
activation_6 (Activation)       (None, 124, 124, 96) 0           batch_normalization_6[0][0]
__________________________________________________________________________________________________
activation_10 (Activation)      (None, 124, 124, 96) 0           batch_normalization_10[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 124, 124, 192 0           activation_6[0][0]
                                                                 activation_10[0][0]
__________________________________________________________________________________________________
conv2d_11 (Conv2D)              (None, 61, 61, 192)  331776      concatenate_2[0][0]
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 61, 61, 192)  576         conv2d_11[0][0]
__________________________________________________________________________________________________
activation_11 (Activation)      (None, 61, 61, 192)  0           batch_normalization_11[0][0]
__________________________________________________________________________________________________
.
.
.
and many more lines

Here there are concatenations and the STEM block seems to be correct. The InceptionResnetV2 first lines seem to be the first lines of InceptionResnetV1. Why is this happening?

hamddan4 commented 3 years ago

Got an answer from google: https://stackoverflow.com/questions/64488034/inceptionresnetv2-stem-block-keras-implementation-mismatch-the-one-in-the-origin

Seems that they were just changing it during internal experiments. Nonetheless, the publishers say there are no difference in performance.