bn relu conv bottleneck

ahundt commented 7 years ago

It seems there may be another change needed for the bottleneck case based on the paper:

We find this design es- pecially effective for DenseNet and we refer to our network with such a bottleneck layer, i.e., to the BN-ReLU-Conv(1× 1)-BN-ReLU-Conv(3×3) version of Hl, as DenseNet-B.

It looks like the network here and in keras contrib doesn't do that order. I think it should be:


def __conv_block(x, nb_filter, bottleneck=False, dropout_rate=None, weight_decay=1e-4):
    '''
    Adds a convolution layer (with batch normalization and relu),
    and optionally a bottleneck layer.

    # Arguments
        x: Input tensor
        nb_filter: integer, the dimensionality of the output space
            (i.e. the number output of filters in the convolution)
        bottleneck: if True, adds a bottleneck convolution block
        dropout_rate: dropout rate
        weight_decay: weight decay factor

     # Input shape
        4D tensor with shape:
        `(samples, channels, rows, cols)` if data_format='channels_first'
        or 4D tensor with shape:
        `(samples, rows, cols, channels)` if data_format='channels_last'.

    # Output shape
        4D tensor with shape:
        `(samples, filters, new_rows, new_cols)` if data_format='channels_first'
        or 4D tensor with shape:
        `(samples, new_rows, new_cols, filters)` if data_format='channels_last'.
        `rows` and `cols` values might have changed due to stride.

    # Returns
        output tensor of block
    '''
    with K.name_scope('ConvBlock'):
        concat_axis = 1 if K.image_data_format() == 'channels_first' else -1

        if bottleneck:
            inter_channel = nb_filter * 4

            x = BatchNormalization(axis=concat_axis, epsilon=1.1e-5)(x)
            x = Activation('relu')(x)
            x = Conv2D(inter_channel, (1, 1), kernel_initializer='he_normal', padding='same', use_bias=False,
                       kernel_regularizer=l2(weight_decay))(x)

        x = BatchNormalization(axis=concat_axis, epsilon=1.1e-5)(x)
        x = Activation('relu')(x)
        x = Conv2D(nb_filter, (3, 3), kernel_initializer='he_normal', padding='same', use_bias=False)(x)
        if dropout_rate:
            x = Dropout(dropout_rate)(x)

    return x

titu1994 commented 7 years ago

That's what it does do ? It's Bn-Relu-Conv1x1-Bn-Relu-Conv3x3.

If it wasn't correct, the caffe weights would not load ImageNet weights nor make correct predictions.

titu1994 commented 7 years ago

Your code does the same by wrapping the initial BN and Relu inside the if block. It makes no actual difference. The code in the repo assumes there must be at least 1 bn-relu. Then it decide add a bottleneck block conv or not. If it does, then it needs to add another bn-relu for the final conv.

Just different ways of representing stuff.

ahundt commented 7 years ago

ah you're right. :-) I guess the change might appear less confusing?

titu1994 / DenseNet

bn relu conv bottleneck #28