Serious implementation gap of `ConvLSTM2D` across 3 backends causes obvious performance differences, no matter what the `padding` scheme is. There may be more than one inconsistent implementations in `ConvLSTM2D` across the 3 backends. #13859
Have I written custom code (as opposed to using example directory):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 & Linux Ubuntu 18.04
Tensorflow backend (yes / no): yes
Tensorflow version:1.15.0
Cntk version: 2.7
Theano version:1.04
Keras version: 2.3.1
Python version: 3.6.9
CUDA/cuDNN version: -
GPU model and memory: -
Describe the current behavior
When I use different kernel_size and strides to build the ConvLSTM2D layer, the results of different backends always have differences, even when the padding scheme is 'valid'. For detailed parameters of ConvLSTM2D, you can refer to the following code snippet.
padding = 'same'
It seems that different backends of Keras have inconsistent implementations of the padding function of the convolutional layer, which leads to differences in output.
When I use ConvLSTM2D(padding = 'same'), the situation is similar to the problem in keras issue 13842. We use the sum of the absolute differences(SoAD) between the outputs of two backends to evaluate the inconsistency, as shown in the following table.
1) kernel_size=1
strides
Tensorflow-CNTK
Tensorflow-Theano
CNTK-Theano
1
1.8702486e+04
5.2728795e-04
1.8702486e+04
2
5.7691240e+03
8.8768400e-05
5.7691240e+03
3
1.2434736e+03
5.5084005e-05
1.2434736e+03
4
5.7755975e+02
4.0959025e-05
5.7755975e+02
5
1.0693531e+03
4.0120474e-05
1.0693531e+03
The above table shows that when kernel_size=1, CNTK usually returns different results with Tensorflow and Theano.
2) kernel_size=2
strides
Tensorflow-CNTK
Tensorflow-Theano
CNTK-Theano
1
28970.742
4955.8623
31461.982
2
5160.826
980.33386
5596.5728
3
1432.2828
435.69504
1648.696
4
1038.2474
362.09113
1107.2095
5
548.93085
267.74567
590.6553
The above table shows that when kernel_size=2, different backends generate different outputs to each other, which is an obviously performance gap.
padding = 'valid'
In keras issue 13842, when I build Conv2D or other convolutional layers withpadding='valid, the outputs of different backends are almost the same. However, we still found significant inconsistencies on ConvLSMT2D even when padding='valid'. In other words, padding='valid' does not affect the inconsistent output in ConvLSTM2D, and there may be more than one inconsistent implementations across the 3 backends. The output differences are shown in the following table.
1) kernel_size=1
strides
Tensorflow-CNTK
CNTK-Theano
Tensorflow-Theano
1
1.6891293e+04
1.6891293e+04
3.3424940e-04
2
3.7266189e+03
3.7266187e+03
1.1239077e-04
3
1.5932040e+03
1.5932040e+03
6.2714811e-05
4
4.5478448e+02
4.5478445e+02
2.7829290e-05
5
1.3272142e+03
1.3272142e+03
4.5338180e-05
2) kernel_size=2
strides
Tensorflow-CNTK
CNTK-Theano
Tensorflow-Theano
1
4256.633
4541.1816
601.3959
2
6015.3574
6055.271
112.198975
3
530.9352
572.4419
76.29698
4
696.4586
732.2359
60.75669
5
945.03467
950.7143
14.999218
From the above 2 tables, we can see that the value of padding will not affect the problem of inconsistent outputs of ConvLSTM2D on different backends.
Key insights
Compared to keras issue 13842, ConvLSTM2D's problem is more serious: there will always be inconsistent outputs across different backends, whatever the padding is. There may be more than one inconsistent implementations in ConvLSTM2D across the 3 backends.
In addition, there is no warning or description about this inconsistency in the Keras documentation, which may cause the model with this layer to get unexpected results on different backends and confuse users.
Code to reproduce the issue
import os
import numpy as np
import keras.layers as L
import keras.backend as K
import importlib
from keras.models import load_model
from keras.engine import Model, Input
backends = ['cntk','tensorflow','theano']
def acc_abs_diff(output1, output2):
assert output1.shape == output2.shape
abs_diff = np.abs(output1-output2)
return np.sum(abs_diff)
def set_keras_backend(backend='tensorflow'):
if K.backend() != backend:
os.environ['KERAS_BACKEND'] = backend
importlib.reload(K.load_backend)
importlib.reload(K)
assert K.backend() == backend
listdiff=[]
for i in range(5):
kwargs = {
'filters': 8,
'kernel_size': 1,#you can change 'kernel_size' here
'strides':i+1,
'padding': 'valid',# you can change 'padding' here
'data_format': 'channels_first',
'dilation_rate': 1,
'use_bias': True,
'unit_forget_bias': False,
'return_sequences': True,
'go_backwards': False,
'stateful': False,
'dropout': 0.7232577469807254,
'recurrent_dropout': 0.7507926892266159
}
input_data = (10 * np.random.random((1,16,16,16,8)))
input = input_data.astype('float32')
set_keras_backend('tensorflow')
layer = L.convolutional_recurrent.ConvLSTM2D(**kwargs)
x = Input(batch_shape=input.shape)
y = layer(x)
bk_model = Model(x, y)
model_path = os.path.join('./', 'model.h5')
bk_model.save(model_path, bk_model)
output = {}
for bk in backends:
try:
set_keras_backend(backend=bk)
model = load_model(model_path)
output[bk] = model.predict(input)
except:
print('error result')
try:
diff1=acc_abs_diff(output['tensorflow'], output['cntk'])
except:
diff1=None
try:
diff2=acc_abs_diff(output['theano'], output['cntk'])
except:
diff2=None
try:
diff3=acc_abs_diff(output['theano'], output['tensorflow'])
except:
diff3=None
listdiff.append([diff1,diff2,diff3])
arraydiff=np.array(listdiff)
print('finish')
System information
Describe the current behavior
When I use different
kernel_size
andstrides
to build the ConvLSTM2D layer, the results of different backends always have differences, even when the padding scheme is 'valid'. For detailed parameters ofConvLSTM2D
, you can refer to the following code snippet.padding = 'same'
It seems that different backends of Keras have inconsistent implementations of the padding function of the convolutional layer, which leads to differences in output. When I use
ConvLSTM2D(padding = 'same')
, the situation is similar to the problem in keras issue 13842. We use the sum of the absolute differences(SoAD) between the outputs of two backends to evaluate the inconsistency, as shown in the following table.1)
kernel_size=1
The above table shows that when
kernel_size=1
, CNTK usually returns different results with Tensorflow and Theano.2)
kernel_size=2
The above table shows that when
kernel_size=2
, different backends generate different outputs to each other, which is an obviously performance gap.padding = 'valid'
In keras issue 13842, when I build
Conv2D
or other convolutional layers withpadding='valid
, the outputs of different backends are almost the same. However, we still found significant inconsistencies onConvLSMT2D
even whenpadding='valid'
. In other words,padding='valid'
does not affect the inconsistent output inConvLSTM2D
, and there may be more than one inconsistent implementations across the 3 backends. The output differences are shown in the following table. 1)kernel_size=1
2)
kernel_size=2
From the above 2 tables, we can see that the value of
padding
will not affect the problem of inconsistent outputs of ConvLSTM2D on different backends.Key insights
Compared to keras issue 13842,
ConvLSTM2D
's problem is more serious: there will always be inconsistent outputs across different backends, whatever thepadding
is. There may be more than one inconsistent implementations inConvLSTM2D
across the 3 backends. In addition, there is no warning or description about this inconsistency in the Keras documentation, which may cause the model with this layer to get unexpected results on different backends and confuse users.Code to reproduce the issue