Serious implementation gap of `ConvLSTM2D` across 3 backends causes obvious performance differences, no matter what the `padding` scheme is. There may be more than one inconsistent implementations in `ConvLSTM2D` across the 3 backends.

System information

Have I written custom code (as opposed to using example directory):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 & Linux Ubuntu 18.04
Tensorflow backend (yes / no): yes
Tensorflow version：1.15.0
Cntk version: 2.7
Theano version:1.04
Keras version: 2.3.1
Python version: 3.6.9
CUDA/cuDNN version: -
GPU model and memory: -

Describe the current behavior

When I use different kernel_size and strides to build the ConvLSTM2D layer, the results of different backends always have differences, even when the padding scheme is 'valid'. For detailed parameters of ConvLSTM2D, you can refer to the following code snippet.

padding = 'same'

It seems that different backends of Keras have inconsistent implementations of the padding function of the convolutional layer, which leads to differences in output. When I use ConvLSTM2D(padding = 'same'), the situation is similar to the problem in keras issue 13842. We use the sum of the absolute differences(SoAD) between the outputs of two backends to evaluate the inconsistency, as shown in the following table.

1) kernel_size=1

strides	Tensorflow-CNTK	Tensorflow-Theano	CNTK-Theano
1	1.8702486e+04	5.2728795e-04	1.8702486e+04
2	5.7691240e+03	8.8768400e-05	5.7691240e+03
3	1.2434736e+03	5.5084005e-05	1.2434736e+03
4	5.7755975e+02	4.0959025e-05	5.7755975e+02
5	1.0693531e+03	4.0120474e-05	1.0693531e+03

The above table shows that when kernel_size=1, CNTK usually returns different results with Tensorflow and Theano.

2) kernel_size=2

strides	Tensorflow-CNTK	Tensorflow-Theano	CNTK-Theano
1	28970.742	4955.8623	31461.982
2	5160.826	980.33386	5596.5728
3	1432.2828	435.69504	1648.696
4	1038.2474	362.09113	1107.2095
5	548.93085	267.74567	590.6553

The above table shows that when kernel_size=2, different backends generate different outputs to each other, which is an obviously performance gap.

padding = 'valid'

In keras issue 13842, when I build Conv2D or other convolutional layers withpadding='valid, the outputs of different backends are almost the same. However, we still found significant inconsistencies on ConvLSMT2D even when padding='valid'. In other words, padding='valid' does not affect the inconsistent output in ConvLSTM2D, and there may be more than one inconsistent implementations across the 3 backends. The output differences are shown in the following table. 1) kernel_size=1

strides	Tensorflow-CNTK	CNTK-Theano	Tensorflow-Theano
1	1.6891293e+04	1.6891293e+04	3.3424940e-04
2	3.7266189e+03	3.7266187e+03	1.1239077e-04
3	1.5932040e+03	1.5932040e+03	6.2714811e-05
4	4.5478448e+02	4.5478445e+02	2.7829290e-05
5	1.3272142e+03	1.3272142e+03	4.5338180e-05

2) kernel_size=2

strides	Tensorflow-CNTK	CNTK-Theano	Tensorflow-Theano
1	4256.633	4541.1816	601.3959
2	6015.3574	6055.271	112.198975
3	530.9352	572.4419	76.29698
4	696.4586	732.2359	60.75669
5	945.03467	950.7143	14.999218

From the above 2 tables, we can see that the value of padding will not affect the problem of inconsistent outputs of ConvLSTM2D on different backends.

Key insights

Compared to keras issue 13842, ConvLSTM2D's problem is more serious: there will always be inconsistent outputs across different backends, whatever the padding is. There may be more than one inconsistent implementations in ConvLSTM2D across the 3 backends. In addition, there is no warning or description about this inconsistency in the Keras documentation, which may cause the model with this layer to get unexpected results on different backends and confuse users.

Code to reproduce the issue

import os
import numpy as np
import keras.layers as L
import keras.backend as K
import importlib
from keras.models import load_model
from keras.engine import Model, Input

backends = ['cntk','tensorflow','theano']

def acc_abs_diff(output1, output2):
    assert output1.shape == output2.shape
    abs_diff = np.abs(output1-output2)
    return  np.sum(abs_diff)

def set_keras_backend(backend='tensorflow'):
    if K.backend() != backend:
        os.environ['KERAS_BACKEND'] = backend
        importlib.reload(K.load_backend)
        importlib.reload(K)
        assert K.backend() == backend

listdiff=[]
for i in range(5):
    kwargs = {
            'filters': 8,
            'kernel_size': 1,#you can change 'kernel_size' here
            'strides':i+1,
            'padding': 'valid',# you can change 'padding' here
            'data_format': 'channels_first',
            'dilation_rate': 1,
            'use_bias': True,
            'unit_forget_bias': False,
            'return_sequences': True,
            'go_backwards': False,
            'stateful': False,
            'dropout': 0.7232577469807254,
            'recurrent_dropout': 0.7507926892266159
    }

    input_data = (10 * np.random.random((1,16,16,16,8)))
    input = input_data.astype('float32')
    set_keras_backend('tensorflow')
    layer = L.convolutional_recurrent.ConvLSTM2D(**kwargs)
    x = Input(batch_shape=input.shape)
    y = layer(x)
    bk_model = Model(x, y)
    model_path = os.path.join('./', 'model.h5')
    bk_model.save(model_path, bk_model)
    output = {}
    for bk in backends:
        try:
            set_keras_backend(backend=bk)
            model = load_model(model_path)
            output[bk] = model.predict(input)
        except:
            print('error result')

    try:
        diff1=acc_abs_diff(output['tensorflow'], output['cntk'])
    except:
        diff1=None
    try:
        diff2=acc_abs_diff(output['theano'], output['cntk'])
    except:
        diff2=None
    try:
        diff3=acc_abs_diff(output['theano'], output['tensorflow'])
    except:
        diff3=None
    listdiff.append([diff1,diff2,diff3])
arraydiff=np.array(listdiff)
print('finish')

keras-team / keras