mkocabas / CoordConv-pytorch

Pytorch implementation of CoordConv introduced in 'An intriguing failing of convolutional neural networks and the CoordConv solution' paper. (https://arxiv.org/pdf/1807.03247.pdf)
397 stars 50 forks source link

Update CoordConv.py #9

Closed vfdev-5 closed 6 years ago

vfdev-5 commented 6 years ago

Fix typo: x_dim -> y_dim and exchange xx_channel and yy_channel.

Comparing with original TF implementation from the paper:

import tensorflow as tf

x_dim = 5
y_dim = 5
batch_size = 2

input_tensor = tf.placeholder(dtype=tf.float32, shape=(batch_size, 1, y_dim, x_dim))

session = tf.InteractiveSession()

batch_size_tensor = tf.shape(input_tensor)[0] 

xx_ones = tf.ones([batch_size_tensor, x_dim], dtype=tf.int32)
xx_ones = tf.expand_dims(xx_ones, -1)
xx_range = tf.tile(tf.expand_dims(tf.range(x_dim), 0),
                  [batch_size_tensor, 1])
xx_range = tf.expand_dims(xx_range, 1)
xx_channel = tf.matmul(xx_ones, xx_range)
xx_channel = tf.expand_dims(xx_channel, -1)

yy_ones = tf.ones([batch_size_tensor, y_dim], dtype=tf.int32)
yy_ones = tf.expand_dims(yy_ones, 1)
yy_range = tf.tile(tf.expand_dims(tf.range(y_dim), 0),
                  [batch_size_tensor, 1])
yy_range = tf.expand_dims(yy_range, -1)
yy_channel = tf.matmul(yy_range, yy_ones)
yy_channel = tf.expand_dims(yy_channel, -1)

xx_channel = tf.cast(xx_channel, "float32") / (x_dim - 1)
yy_channel = tf.cast(yy_channel, "float32") / (y_dim - 1)

xx_channel = xx_channel*2 - 1
yy_channel = yy_channel*2 - 1

np_xx_channel = xx_channel.eval()
print("xx_channel", np_xx_channel.shape)

np_yy_channel = yy_channel.eval()
print("yy_channel", np_yy_channel.shape)

session.close()

gives

np_xx_channel[..., 0]
array([[[-1. , -0.5,  0. ,  0.5,  1. ],
        [-1. , -0.5,  0. ,  0.5,  1. ],
        [-1. , -0.5,  0. ,  0.5,  1. ],
        [-1. , -0.5,  0. ,  0.5,  1. ],
        [-1. , -0.5,  0. ,  0.5,  1. ]],

       [[-1. , -0.5,  0. ,  0.5,  1. ],
        [-1. , -0.5,  0. ,  0.5,  1. ],
        [-1. , -0.5,  0. ,  0.5,  1. ],
        [-1. , -0.5,  0. ,  0.5,  1. ],
        [-1. , -0.5,  0. ,  0.5,  1. ]]], dtype=float32)

and

np_yy_channel[..., 0]
array([[[-1. , -1. , -1. , -1. , -1. ],
        [-0.5, -0.5, -0.5, -0.5, -0.5],
        [ 0. ,  0. ,  0. ,  0. ,  0. ],
        [ 0.5,  0.5,  0.5,  0.5,  0.5],
        [ 1. ,  1. ,  1. ,  1. ,  1. ]],

       [[-1. , -1. , -1. , -1. , -1. ],
        [-0.5, -0.5, -0.5, -0.5, -0.5],
        [ 0. ,  0. ,  0. ,  0. ,  0. ],
        [ 0.5,  0.5,  0.5,  0.5,  0.5],
        [ 1. ,  1. ,  1. ,  1. ,  1. ]]], dtype=float32)

However current implementations give inverted results.

mjDelta commented 6 years ago

Hi, vfdev-5. The current implementations's dim order is [batch_size, ch, x_dim, y_dim] and your example is [batch_size, ch, y_dim, x_dim], maybe the inverted order of x_dim and y_dim cause the inverted results.
Beside, your code works when x_dim and y_dim is the same value. But it doesn't work with different x_dim and y_dim. Maybe you should consider it.

vfdev-5 commented 6 years ago

@mjDelta I see what you mean. Thanks for the comment !