BN+ReLU+Conv Work for you?

Hello pudae:

 I try densenet121 base on your code for FR project, but it seems that the Conv structure ( BN+ReLU+Conv) is not work for me.
 I modify the Conv structure to Conv+BN+ReLU, the training is ok but the accuracy is lower. 
 So I try to modify the Conv structure to BN+ReLU+Conv+BN+ReLU, it seems that the training is ok and the accuracy is better than above two structure.
 I am confused for that. Do you have any suggenstion for that? My part of densenet code is as below:

reduction=0.5
growth_rate=32
num_filters=64
compression = 1.0 - reduction
num_layers=[6,12,24,16]
num_dense_blocks = len(num_layers)

with slim.arg_scope([slim.conv2d, slim.fully_connected],
                    weights_initializer=slim.xavier_initializer_conv2d(uniform=True),
                    weights_regularizer=slim.l2_regularizer(weight_decay),
                    activation_fn=None,
                    biases_initializer=None):

    with tf.variable_scope('densenet121', [images], reuse=reuse):
        with slim.arg_scope([slim.batch_norm], 
                            scale=True,
                            decay=0.99,
                            epsilon=1.1e-5), \
             slim.arg_scope([slim.batch_norm, slim.dropout], is_training=phase_train), \
             slim.arg_scope([_conv], dropout_rate=None):

            # initial convolution
            print ("input size: ", images.get_shape())
            net = slim.conv2d(images, num_filters, 7, stride=2, scope='conv1')
            print ("conv1 size: ", net.get_shape())
            net = slim.batch_norm(net)
            net = tf.nn.relu(net)
            net = slim.max_pool2d(net, 3, stride=2, padding='SAME')
            print ("max pool size: ", net.get_shape())

            # blocks
            for i in range(num_dense_blocks - 1):
                # dense blocks
                net, num_filters = _dense_block(net, num_layers[i], num_filters, growth_rate, scope='dense_block' + str(i+1))
                print ("dense block %d size: %s" % (i, net.get_shape()))

                # Add transition_block
                net, num_filters = _transition_block(net, num_filters, compression=compression, scope='transition_block' + str(i+1))
                print ("transition block %d size: %s" % (i, net.get_shape()))

            net, num_filters = _dense_block(
                    net, num_layers[-1], num_filters,
                    growth_rate,
                    scope='dense_block' + str(num_dense_blocks))
            print ("dense block %d size: %s" % (i+1, net.get_shape()))

            # final blocks
            with tf.variable_scope('final_block', [images]):
                net = slim.batch_norm(net)
                net = tf.nn.relu(net)
                net = tf.reduce_mean(net, [1,2], name='global_avg_pool', keep_dims=False)
                print ("global ave pooling size: %s" % (net.get_shape()))

            net = slim.batch_norm(net)
            net = tf.nn.relu(net)
            net = slim.fully_connected(net, bottleneck_layer_size, activation_fn=None,
                                                 scope='logits', reuse=False)
            print ("fully connection size: %s" % (net.get_shape()))
            return net, None

I try densenet121 base on your code for FR project, but it seems that the Conv structure ( BN+ReLU+Conv) is not work for me.

Do you mean that the network is not converged? I didn't train from scratch. But in transfer learning, it's ok in my case.

I modify the Conv structure to Conv+BN+ReLU, the training is ok but the accuracy is lower.

In densenet structure, pre-activation batch-normalization(BN+ReLU+Conv) is important, because each layers can apply a unique scale and bias to previous feature. For more detail, reference "Memory efficient Implementation of DenseNes", figure 2.

So I try to modify the Conv structure to BN+ReLU+Conv+BN+ReLU, it seems that the training is ok and the accuracy is better than above two structure.

Do you repeat BN+ReLU+Conv+BN+ReLU? I think the post-activation BN is redundant. And I also have no idea why BN-ReLU-Conv-BN-ReLU structure is better than BN-ReLU-Conv structure.

pudae / tensorflow-densenet

BN+ReLU+Conv Work for you? #1