Inconsistency in Gradients Calculation of reduce_max

Hi, I found that the gradients calculation of reduce_max in CNTK is different from other deep learning libraries such as tensorflow. And I want to know is it a bug?

Here is an example code using CNTK2.7:

import numpy as np
import cntk as C

x = C.input_variable(shape=(1, 3, 1), needs_gradient=True)
x_val = np.array([[[0.6],
                   [0.6],
                   [0.3]]])

y = C.reduce_max(x)
g = y.grad({x: x_val})

print("gradients of max: ", g)

The result is:

gradients of max:  [[[[1.]
                      [1.]
                      [0.]]]]

And this is the code using TensorFlow2.6.0:

import numpy as np
import tensorflow as tf

with tf.GradientTape() as tape:
    x = tf.Variable([[[0.6],
                      [0.6],
                      [0.3]]])

    y = tf.reduce_max(x)
g = tape.gradient(y, x)

print("gradients of max: ", g.numpy())

The result is:

gradients of max:  [[[0.5]
                     [0.5]
                     [0. ]]]

The inconsistency exists when there are multiple max elements.

Any replies will be appreciated.

microsoft / CNTK

Inconsistency in Gradients Calculation of reduce_max #3855