keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.92k stars 19.45k forks source link

When I set `padding= same` in MaxPooling2D/AveragePooling2D, different parameter configurations will trigger inconsistent results on different backends, especially coping with diffrent `strides` values. #13841

Closed Jup11ter closed 3 years ago

Jup11ter commented 4 years ago

System information

Describe the current behavior

When setting padding = same in MaxPooling2D and AveragePooling2D, we can observe obvious output inconsistencies between different backends under different combinations of poolsize and strides, given the same layer input and weights. For example, when setting pool_size = 2 and strides = 1 in AveragePooling2D, Theano behaves quite differently with CNTK and Tensorflow.

To alleviate the impact of randomness, we conducted several repeated tests under the same configuration, and got similar results. Using AveragePooling2D as an example, below shows the detailed output inconsistencies under one of these trials. Note that, we take the sum of absolute diffenece (SoAD) as the metric below (See SoAD implementation in code snippet below).

1) pool_size=1 on AveragePooling2D

strides Tensorflow-CNTK Tensorflow-Theano CNTK-Theano
1 0 0 0
2 0 0 0
3 0 0 0
4 3486.9348074873396 0 3486.9348074873396
5 0 0 0
6 0 0 0
7 1352.1914413716454 0 1352.1914413716454
8 894.2556553477407 0 894.2556553477407
9 872.0831933177874 0 872.0831933177874
10 0 0 0

2) pool_size=2 on AveragePooling2D

strides Tensorflow-CNTK Tensorflow-Theano CNTK-Theano
1 0 23810.60574585 23810.60574585
2 0 6086.89803055 6086.89803055
3 0 2845.84598675 2845.84598675
4 1406.43844484 1562.48328647 1818.60488969
5 0 1177.9917318 1177.9917318
6 0 881.51344889 881.51344889
7 575.09660981 594.96606259 747.960241
8 426.97600164 413.47323987 490.51357441
9 408.0700912 353.56068743 433.0762342
10 0 404.73061594 404.73061594

3) pool_size=3 on AveragePooling2D

strides Tensorflow-CNTK Tensorflow-Theano CNTK-Theano
1 0 4.28967972254668e-12 4.28967972254668e-12
2 3341.4175014497478 3341.417501449748 1.0436096431476471e-12
3 1591.7682246308545 1591.7682246308545 4.551914400963142e-13
4 0 846.7093040097947 846.7093040097947
5 659.33280348679 659.3328034867901 1.8252066524837574e-13
6 475.1066414369499 475.1066414369499 1.021405182655144e-13
7 0 350.878200849514 350.878200849514
8 277.9826966410767 211.56357127315164 304.70752801483616
9 214.17734053203276 201.9044476454168 278.40342817756334
10 219.61688923217574 219.61688923217577 5.062616992290714e-14

From above results we can see, when setting pool_size=1, CNTK performs differently with Tensorfow or Theano in some senarios. And when setting pool_size=2or pool_size=3, more diverse inconsistencies can be trigger arcoss all of the 3 backends.

We also do the same test on Maxpooling2D and get similar results. The following table only shows the situation where pool_size =1.

4) pool_size=1 on Maxpooling2D

strides Tensorflow-CNTK CNTK-Theano Tensorflow-Theano
1 0 0 0
2 0 0 0
3 0 0 0
4 3409 3409 0
5 0 0 0
6 0 0 0
7 1199 1199 0
8 862 862 0
9 841 841 0
10 0 0 0

Key insights

Note that, the above issues only occur when padding = same. When setting padding=valid, all backends generate the same outputs. Based on that, we think the three backends follow different implementations in case of the padding=same, especially when coping with the strides option.

Perfomance issue in applications

Even worse, the above implementation gap may result in severe performace issue in practice. For example, we train a simple model (i.e., LeNet-5) on MNIST using Tensorflow as backend, and then switch across the 3 backends to conduct prediction. We slightly change the strides value in the layer MaxPooling2D, and get obvious gap in terms of prediction accuray. Below are the configuration, prediction results , and model architecture of this performance issue.

1) Configurations

Case Configurations
Case0 pool_size=(2,2), strides=(2,2), padding='valid'
Case1 pool_size=(2,2), strides=(2,2), padding='same'
Case2 pool_size=(2,2), strides=(3,3), padding='same'
Case3 pool_size=(2,2), strides=(4,4), padding='same'
Case4 pool_size=(2,2), strides=(3,3), padding='valid'

2) Prediction accuracy(%)

Tensorflow CNTK Theano
Case0 99.18 99.18 99.18
Case1 99.02 99.02 85.17
Case2 99.04 99.04 83.57
Case3 98.83 70.23 82.83
Case4 99.00 99.00 99.00

3) Model architecture

from keras.layers import Input, Conv2D, Dense, Flatten, MaxPooling2D, Activation
from keras.regularizers import l2
from keras.models import Model

class model:
    def __init__(self, input_shape, cls_num=10):
        self.name = 'LeNet-5'
        self.input_shape = input_shape
        self.cls_num = cls_num

        self.kernel_size = (5,5)
        self.weight_decay = 0.0001

    def build_model(self):
        input = Input(shape=self.input_shape)
        x = Conv2D(6, self.kernel_size, padding='valid', activation='relu', kernel_initializer='he_normal',
                         kernel_regularizer=l2(self.weight_decay))(input)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Conv2D(16, self.kernel_size, padding='valid', activation='relu', kernel_initializer='he_normal',
                         kernel_regularizer=l2(self.weight_decay))(x)

        x = MaxPooling2D((2, 2), strides=(2, 2),padding='valid')(x) # Case0
        #x = MaxPooling2D((2, 2), strides=(2, 2), padding='same')(x) # Case1
        #x = MaxPooling2D((2, 2), strides=(3, 3), padding='same')(x) # Case2
        #x = MaxPooling2D((2, 2), strides=(4, 4), padding='same')(x) # Case3
        #x = MaxPooling2D((2, 2), strides=(3, 3), padding='valid')(x) # Case4

        x = Flatten()(x)
        x = Dense(120, activation='relu', kernel_initializer='he_normal', kernel_regularizer=l2(self.weight_decay))(x)
        x = Dense(84, activation='relu', kernel_initializer='he_normal', kernel_regularizer=l2(self.weight_decay))(x)
        x = Dense(self.cls_num, kernel_initializer='he_normal', kernel_regularizer=l2(self.weight_decay))(x)
        x = Activation('softmax')(x)
        lenet5 = Model(input, x)
        return lenet5

Code to reproduce the issue

import os
import numpy as np
import keras.layers as L
import keras.backend as K
import importlib
from keras.models import load_model
from keras.engine import Model, Input

backends = ['cntk','tensorflow','theano']

# SoAD, Sum of Absolute Difference
def acc_abs_diff(output1, output2):
    assert output1.shape == output2.shape
    abs_diff = np.abs(output1-output2)
    return  np.sum(abs_diff)

def set_keras_backend(backend='tensorflow'):
    if K.backend() != backend:
        os.environ['KERAS_BACKEND'] = backend
        importlib.reload(K.load_backend)
        importlib.reload(K)
        assert K.backend() == backend

kwargs={'pool_size': 1, 'padding': 'same', 'strides': 1, 'data_format': 'channels_last'}
listdiff=[]
for stride_value in range(10):
    kwargs['strides']=stride_value+1
    input_data = (10 * np.random.random((1,32,32,16)))
    input = input_data.astype('float32')
    set_keras_backend('tensorflow')
    layer = L.pooling.MaxPool2D(**kwargs)# you can also use AveragePooling2D
    x = Input(batch_shape=input.shape)
    y = layer(x)
    bk_model = Model(x, y)
    model_path = os.path.join(yourpath, 'model.h5')
    bk_model.save(model_path, bk_model)
    output = {}
    for bk in backends:
        try:
            set_keras_backend(backend=bk)
            model = load_model(model_path)
            output[bk] = model.predict(input)
        except:
            print('error result')
    try:
        diff1 = acc_abs_diff(output['tensorflow'], output['cntk'])
    except:
        diff1 = None
    try:
        diff2 = acc_abs_diff(output['theano'], output['cntk'])
    except:
        diff2 = None
    try:
        diff3 = acc_abs_diff(output['theano'], output['tensorflow'])
    except:
        diff3 = None
    listdiff.append([diff1, diff2, diff3])
arraydiff = np.array(listdiff)
print(arraydiff)
print('finish')
Saduf2019 commented 3 years ago

@Jup11ter Moving this issue to closed status as there has been no activity, in case you still face the error please create a new issue.