keras-team / tf-keras

The TensorFlow-specific implementation of the Keras API, which was the default Keras from 2019 to 2023.
Apache License 2.0
64 stars 29 forks source link

How are the shapes of convolutional layers calculated #79

Open pokecheater opened 2 years ago

pokecheater commented 2 years ago

Hello Keras-Team :),

I use Tf2.8 with tf.keras. For a web app I need to precalculate the shapes of the layers a user is using. So I have derived formulars for nearly all layers so far. But I am stuck on the convolutional layers.

For convolutional layers it is not allowed to have a dilation_rate > 1 and strides > 1. Why is that and why is it possible for other convolutional layers like SeperableConv or DepthwiseConv? From my understanding defining a dilation rate > 1 can be understood as greater kernel size with gaps in it. So it should also be possible to jump with that „greater kernel“ a given stepsize (which is the strides) or not?

So far I came up with the following formular which works for any convolutional layer (except transpose of course) as long as one of the parameters dilation_rate or the strides are 1. (You don't have to get really into the following formula, what would really help me out is just the correct formula, but for completness sake I paste it here).


/*
        For a given dimension:
        p - is previus_shape dimension value
        k - is kernel_size
        d - dilation rate
        s - strides rate
*/

if(p < k) return "invalid";
if(d > 1) k = k + (k-1) * (d-1);

if(padding === "valid") {
        const kernel_poses = p-(k-1)-s; // the theoretical amount of positions we can place the kernel if strides (s) === 1
        if(s===1) return Math.ceil(kernel_poses);
        else return Math.ceil(kernel_poses/s);   // here the returned value differs sometimes from what I get when I call model.summarize()
} else if(padding === "same") {
        const kernel_poses = p; // the theoretival amount of positions we can place the kernel if strides (s) === 1
        if(s===1) return Math.ceil(kernel_poses);
        else return Math.ceil(kernel_poses/s);
}

To Summarize my questions and needs here:

  1. I would like to understand why dilation_rate > 1 and strides > 1 is not allowed.
  2. I would like to understand why on other convolutional layers (ConvSeparable, DepthwiseConv) it is allowed to set both parameters > 1 (allthough the documentation states something different)
  3. My Main need are formulas for the shape calculation of each convolutional layer.

Thx in advance <3

Zhaopudark commented 2 years ago

Hi. To your third question, you can see three funcs called conv_output_length(), conv_input_length() and deconv_output_length in \tensorflow\python\keras\utils\conv_utils.py of tensorflow 2.8's source code. These three functions havn't opened to users, but did be used in tf.keras.ConvTranspose.

Above three functions show the shape calculation's detail. But if we draw a conlusion, in tensorflow and keras, the calculation of each convolution layer‘s shape obeys the following formula:

see the:
https://www.tensorflow.org/api_docs/python/tf/nn#notes_on_padding_2
If padding == "SAME": output_shape = ceil(input_length/stride)
If padding == "VALID": output_shape = ceil((input_length-(filter_size-1)*dilation_rate)/stride)
pokecheater commented 2 years ago

@Zhaopudark Thank you for your very fast reply. I will try it out on monday. Crossing fingers that the formula work out. :)

old-school-kid commented 2 years ago

@pokecheater take a look at this

pokecheater commented 2 years ago

Hey @Zhaopudark,

I tested it and I can still not verify that the given formula works for me. Since setting both, the dilation_rate and strides > 1 is not allowed on normal conv layers I will show you what I mean for SeperableConv2D (same applies for DepthwiseConv2D except for the output channel dimension).

To make things more clear and to speed them up I used your formula and play copy-cat. I pasted it into a shape calculator function. Then I wrote 4 simple test cases. The first 2 will pass, the shape is then calculated correctly. The last 2 will fail. I assume, that the shape calcluation for DepthwiseConv2D will not differ from normal convolution, since only the filter dimension will be overwritten. For Separable as far as I understand it, the shape calculation of a normal convolution should also be the same formula.

The test cases indicate, that the formula seems only to be valid for certain circumstances. It is in fact equivalent to what I have already derived (the results are the same).

import math
import tensorflow as tf

def create_model(input_shape, filter_size, filters, strides, dilation_rate, padding):
    '''
        Just a simple creator function to build up a simple convolutional model with exact one layer.
    '''
    model = tf.keras.Sequential()
    #model.add(tf.keras.layers.Conv2D(2, 3, dilation_rate=dilation_rate, strides=strides, activation='relu', input_shape=(128, 128, 3)))
    model.add(
        tf.keras.layers.SeparableConv2D(
            filters = filters, 
            kernel_size = filter_size, 
            dilation_rate = dilation_rate, 
            strides = strides, 
            activation = 'relu', 
            input_shape = input_shape,
            padding = padding
        )
    )
    sgd = tf.optimizers.SGD(lr=0.1, momentum=0.9)
    model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
    return model

def calcShape(input_shape, filter_size, filters, strides, dilation_rate, padding):
    '''
        A simple calcShape function. Based on the provided formular given from:
        https://www.tensorflow.org/api_docs/python/tf/nn#notes_on_padding_2
    '''
    result_shape = [None]

    for i in range(len(input_shape)-1):
        output_shape = None

        if padding == "SAME": 
            output_shape = math.ceil(input_shape[i]/strides)
        elif padding == "VALID": 
            output_shape = math.ceil((input_shape[i]-(filter_size-1)*dilation_rate)/strides)

        result_shape.append(output_shape)
    result_shape.append(filters)

    return tuple(result_shape)

def compareShapes(test_no, test_params):
    '''
        Our testing function. Creates the model, extracts the convolutional layers 
        output shape and compares it against our calculated shape.
    '''
    model = create_model(**test_params)
    conv_output_shape = model.get_layer(index=0).output_shape
    # model.summary()
    calculated_shape = calcShape(**test_params)
    passed = conv_output_shape == calculated_shape

    print("")
    print('Test #{0}: Passed: {1}'.format(test_no, passed))
    print("conv_output_shape", conv_output_shape)
    print("calculated_shape", calculated_shape)
# Test 1
test_params = {
    "input_shape": (128, 128, 3),
    "filter_size": 3,
    "dilation_rate": 3,
    "strides": 3,
    "padding": "VALID",
    "filters": 200
}
compareShapes(1, test_params)

# Test 2
test_params = {
    "input_shape": (256, 256, 3),
    "filter_size": 5,
    "dilation_rate": 3,
    "strides": 5,
    "padding": "VALID",
    "filters": 200
}
compareShapes(2, test_params)

# Test 3
test_params = {
    "input_shape": (256, 256, 3),
    "filter_size": 5,
    "dilation_rate": 5,
    "strides": 3,
    "padding": "VALID",
    "filters": 200
}
compareShapes(3, test_params)

# Test 4
test_params = {
    "input_shape": (256, 256, 3),
    "filter_size": 5,
    "dilation_rate": 10,
    "strides": 7,
    "padding": "VALID",
    "filters": 200
}
compareShapes(4, test_params)
pokecheater commented 2 years ago

here is my output:

Test keras-team/keras#1: Passed: True
conv_output_shape (None, 41, 41, 200)
calculated_shape (None, 41, 41, 200)
Test keras-team/keras#2: Passed: True
conv_output_shape (None, 49, 49, 200)
calculated_shape (None, 49, 49, 200)
Test keras-team/keras#3: Passed: False
conv_output_shape (None, 76, 76, 200)
calculated_shape (None, 79, 79, 200)
Test keras-team/keras#4: Passed: False
conv_output_shape (None, 36, 36, 200)
calculated_shape (None, 31, 31, 200)
Zhaopudark commented 2 years ago

Hello, @pokecheater Since SeperableConvnD consists with depthwise_conv and pointwise_conv. It is quite difficult to draw an outside formula for SeperableConv.

The more detailed information can see here: https://github.com/keras-team/keras/blob/v2.8.0/keras/layers/convolutional.py#L2106-L2272 https://github.com/tensorflow/tensorflow/blob/3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/python/ops/nn_impl.py#L996-L1099 https://github.com/tensorflow/tensorflow/blob/v2.8.0/tensorflow/python/ops/nn_ops.py#L536-L694

Shortly, SeperableConv's depthwise_conv procedure is easily to understand, obey traditional conv formula, but it does not use dilation_rate directly. The dilation behavior operation is conducted in pointwise_conv by ops such "tf.nn.with_space_to_batch". Even in its documents, the calculation procedure is quite cumbersome.

I feel so sorry that presently unable to give you a more detailed formula for keras's SeperableConv behavior.I'm working on a PR for 3D conv's ops for several days, SeperableConv is also in plan, but during the working process, another problem occurred that will influence my investigation experiments. So I have to fix the unexpected problem first.

pokecheater commented 2 years ago

Thanks for your answer. You have my full understanding. Development is always in fixing issues and while fixing them discover new ones.

Good to know that the formula is different for these convolutional layers. I was just confused by the fact that the formula applied sometimes perfectly (especially for the cases when one of the parameters (dilation_rate or strides) where equal to 1). So I thought it is just a small detail that I am missing here.

Would be great to hear in near future from you when you can give an answer to it. Greetz & Thx @Zhaopudark

old-school-kid commented 2 years ago

Can you close this? @pokecheater TIA

sushreebarsa commented 2 years ago

@pokecheater Is this still an issue? Please move this issue to closed status if it is resolved for you ? Thanks!

pokecheater commented 2 years ago

Hey Sry, haven't answered for a while. (Since our current release is done with this bug my attention moved to other features and issues).

No it is still an issue. I was not able to find the exact formualars for those convolutional layers from the code.