tensorflow / addons

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
Apache License 2.0
1.69k stars 611 forks source link

Unpooling layer in tensorflow #632

Closed ziky90 closed 3 years ago

ziky90 commented 8 years ago

It would be nice to have in TensorFlow also the unpooling layer as it is described in the paper on deconvolution networks: http://cvlab.postech.ac.kr/research/deconvnet/

I was googling a bit and I found that the added unpooling layer would be handful also for others: http://stackoverflow.com/questions/36548736/tensorflow-unpooling

zheng-xq commented 8 years ago

For deconv, you can use "conv2d_backprop_input" with stride to achieve similar effect. It is the gradient of the conv with stride.

daeyun commented 8 years ago

my implementation using tf.reshape and tf.concat

def unpool(value, name='unpool'):
    """N-dimensional version of the unpooling operation from
    https://www.robots.ox.ac.uk/~vgg/rg/papers/Dosovitskiy_Learning_to_Generate_2015_CVPR_paper.pdf

    :param value: A Tensor of shape [b, d0, d1, ..., dn, ch]
    :return: A Tensor of shape [b, 2*d0, 2*d1, ..., 2*dn, ch]
    """
    with tf.name_scope(name) as scope:
        sh = value.get_shape().as_list()
        dim = len(sh[1:-1])
        out = (tf.reshape(value, [-1] + sh[-dim:]))
        for i in range(dim, 0, -1):
            out = tf.concat([out, tf.zeros_like(out)], i)
        out_size = [-1] + [s * 2 for s in sh[1:-1]] + [sh[-1]]
        out = tf.reshape(out, out_size, name=scope)
    return out

def pool(value, name='pool'):
    """Downsampling operation.
    :param value: A Tensor of shape [b, d0, d1, ..., dn, ch]
    :return: A Tensor of shape [b, d0/2, d1/2, ..., dn/2, ch]
    """
    with tf.name_scope(name) as scope:
        sh = value.get_shape().as_list()
        out = value
        for sh_i in sh[1:-1]:
            assert sh_i % 2 == 0
        for i in range(len(sh[1:-1])):
            out = tf.reshape(out, (-1, 2, np.prod(sh[i + 2:])))
            out = out[:, 0, :]
        out_size = [-1] + [math.ceil(s / 2) for s in sh[1:-1]] + [sh[-1]]
        out = tf.reshape(out, out_size, name=scope)
    return out
mwalton commented 8 years ago

I've been interested in this as well; currently working on 'what-where' / convolutional autoencoders (ala. Zhao et al.)

Thanks @daeyun for the code, I've been trying to figure this out myself. Dosovitskiy uses a kronecker product w/ a block mask (same shape as pooling, all zeros w/ a 1 in the upper left) to unpool. However, as observed in the paper (fig 9) this fails to reconstruct meaningful structure in deeper feature maps. An alternative proposed by Zeiler uses 'switches' (essentially the argmax of the maxpooling operation) to reconstruct using the exact location of the maxima

I've been playing around with tf.maxpool_with_argmax in an attempt to reproduce the 'switched' unpooling experiments first explored by Zeiler and extended by Zhao.

Any thoughts on how this could be implemented?

girving commented 8 years ago

What's the mathematical definition of unpooling?

ziky90 commented 8 years ago

The unpooling that I had on my ming is described in here http://www.matthewzeiler.com/pubs/iccv2011/iccv2011.pdf and corresponding implementation in caffe can be found here: https://github.com/HyeonwooNoh/caffe/blob/master/src/caffe/layers/unpooling_layer.cpp Also some more formal description is available in the torch documentation: https://github.com/torch/nn/blob/master/doc/convolution.md#spatialmaxunpooling

girving commented 8 years ago

@ziky90 That's the gradient of max pooling, which we already have an as op.

ziky90 commented 8 years ago

@girving Thank you for pointing me at gradient of max pooling. Though it's really difficult to find it as a gradient of max pooling, plus it's also not much documented. Is there a plan to create separate "layer", for example tf.nn.max_unpool, etc.? From my point of view it'd be much more intuitive, together with adding the documentation it would make it super easy to use.

Btw. It seems, that it confuses and makes other people to build custom solutions instead of simply using something like tf.nn.max_unpool. @ppwwyyxx https://github.com/ppwwyyxx/tensorpack/blob/master/tensorpack/models/pool.py#L66

girving commented 8 years ago

Yes, giving it a name like tf.nn.max_unpool with good documentation might be good, and we'd be happy to accept PRs.

As a tip for the future, though: this is one advantage of trying to understand the mathematical relationship between different operations. Once you know that unpooling is just the gradient of pooling, it's clear that TensorFlow already implements it, even if the name is different from what one might expect.

dbbert commented 8 years ago

Could you share a code example of how to implement unpooling using the gradient of max pooling?

girving commented 8 years ago

It's currently hidden as gen_nn_ops._max_pool_grad, and is used only from the gradient of max_pool:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn_grad.py#L353

There's also gen_nn_ops._max_pool_with_argmax_grad. Unfortunately, both of them take the original input, which means they'd have to be tweaked to serve as unpooling.

NickShahML commented 8 years ago

Any plans to get a unpool layer to tensorflow? @girving as you point out, if the gradient operation already exists, then it doesn't seem like much work to get it working?

girving commented 8 years ago

@LeavesBreathe I was wrong initially about how easy it would be, since the gradient operators as written taken the original input. Thus, we probably do need a new exposed op, though it may be able to use the same underlying compute kernels (I'm not sure).

syed-ahmed commented 8 years ago

Are there any performance gain/loss if one uses the second output of tf.nn.max_pool_with_argmax (which are the indices of the max pool) and uses it along with a tf.map_fn to achive a max unpooling?

girving commented 8 years ago

@syed-ahmed That doesn't work: if you are doing unpooling, you don't start out with an input that you could pass to tf.nn.max_pool_with_argmax.

syed-ahmed commented 8 years ago

@girving Can we not just save the indices from tf.nn.max_pool_with_argmax during downsampling for reuse during upsampling? We would use the saved argmax indices to inform us where we want the input to the corresponding upsample layer to go.

girving commented 8 years ago

@syed-ahmed To clarify, it will work but it's a bit awkward. You can certainly store the indices, but the current MaxPoolGradWithArgmax op also wants the values that you originally passed to max pooling. It should use only the shape from these values, but you still need to pass them in. That's not too horrible when it's used as a gradient (though it's still a memory usage bug), but it is not clean enough to give it a nice name.

The same bug occurred in the initial version of conv_3d, so if someone wants to fix this they can look at https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/conv_grad_ops_3d.cc. The code defines a new op that takes an original shape input rather the whole original input, and uses the same C++ kernel to implement both of them (with a conditional based on name).

If anyone does this, the new op can be given a nicer name like max_unpool.

syed-ahmed commented 8 years ago

@girving Thanks for clarifying! I totally forgot the case about the gradient. I'll try to fix this issue.

syed-ahmed commented 8 years ago

Hi @girving, could you please tell what error would result with the memory usage bug? Just wanted to clarify, is it a bug because it's not best practice or did you encounter an error during that initial version of conv_3d? I get the following error for the implementation described above with MaxPoolWithArgmax and was wondering if anybody encountered it before:

E tensorflow/stream_executor/cuda/cuda_driver.cc:1110] failed to synchronize the stop event: CUDA_ERROR_ILLEGAL_ADDRESS
E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x69951c0: CUDA_ERROR_ILLEGAL_ADDRESS
E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x69951c0: CUDA_ERROR_ILLEGAL_ADDRESS
F tensorflow/stream_executor/cuda/cuda_timer.cc:64] Check failed: start_event_ != nullptr && stop_event_ != nullptr ```
girving commented 8 years ago

@syed-ahmed It's not an actual error unless you run out of memory. The issue is that if the gradient takes the original input tensor rather than the shape, the original input must be stored for the remainder of the forward pass and the backward pass up to that point. If only the shape is needed, that's a long time to hold onto otherwise unneeded memory.

syed-ahmed commented 8 years ago

@girving Thanks for your reply. I am defining a MaxUnpoolGrad for the corresponding MaxUnpool operation that I have implemented. Following is what I declare as top_offset and bottom_offset for MaxUnpoolGrad:

const int top_offset = params.tensor_in_rows * params.tensor_in_cols * params.depth; 
const int bottom_offset = params.out_height * params.out_width * params.depth;

The correspoding cuda kernel declared in maxpooling_op_gpu.cu.cc is:

template <typename dtype>
__global__ void MaxUnpoolBackward(const int nthreads, const dtype* top_diff,
                                          const int64* mask, const int top_offset,
                                  const int bottom_offset, dtype* bottom_diff) {
  CUDA_1D_KERNEL_LOOP(index, nthreads) {
    int image_id = (index / bottom_offset);
    CudaAtomicAdd(bottom_diff + index, top_diff[mask[index] + image_id * top_offset]);
  }
}

My graph builds but it is when the session runs that I get the following error:

E tensorflow/stream_executor/cuda/cuda_driver.cc:1110] failed to synchronize the stop event: CUDA_ERROR_ILLEGAL_ADDRESS
E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x69951c0: CUDA_ERROR_ILLEGAL_ADDRESS
E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x69951c0: CUDA_ERROR_ILLEGAL_ADDRESS
F tensorflow/stream_executor/cuda/cuda_timer.cc:64] Check failed: start_event_ != nullptr && stop_event_ != nullptr 

I am also returning in nn_grad.py like this:

[None, gen_nn_ops._max_unpool_grad(array_ops.shape(op.inputs[1]),
                                     grad,
                                     op.inputs[2],
                                     op.get_attr("ksize"),
                                     op.get_attr("strides"),
                                     padding=op.get_attr("padding")), None)]

where:

MaxUnpool
-input0: input_shape
-input1: grad_in
-input3: argmax

I have made sure the maxunpooling and its grad operation is taking a input shape rather than a input 4D tensor. Do you know how to debug this cuda errors/any tool that can help in finding the origin of these errors? What does these errors indicate? I read a comment on the maxpooling_op_gpu.cu.cc about racing conditions. Is it anyhow related to this?

girving commented 8 years ago

@syed-ahmed Is it possible to use cuDNN for these operations? Writing them yourself will result in very slow code. The same goes for CPU: it would be better to use existing Eigen code if possible.

syed-ahmed commented 8 years ago

@girving Thank you for your reply. I will try implementing the cudnn version once i get this cuda one running. I was able to use cuda-gdb to get some sort of trace where my error is originating from. Here's the output from cuda-gdb:

CUDA Exception: Warp Out-of-range Address
The exception was triggered at PC 0x7ffe9976c1d0

Program received signal CUDA_EXCEPTION_5, Warp Out-of-range Address.
[Switching focus to CUDA kernel 0, grid 4660, block (172,0,0), thread (256,0,0), device 0, sm 0, warp 40, lane 0]
0x00007ffe9976c218 in void tensorflow::(anonymous namespace)::MaxUnpoolForward<float>(int, float const*, long long const*, int, int, float*)<<<(662,1,1),(1024,1,1)>>> ()

Here's how it is defined in the cu.cc file:

...
template <typename dtype>
__global__ void MaxUnpoolForward(const int nthreads, const dtype* top_diff,
                                const int64* mask, const int top_offset,
                                const int bottom_offset, dtype* bottom_diff) {
  CUDA_1D_KERNEL_LOOP(index, nthreads) {
    int image_id = (index / top_offset);
    CudaAtomicAdd(bottom_diff + image_id * bottom_offset + mask[index],
                  top_diff[index]);
  }
}

template <typename dtype>
__global__ void MaxUnpoolBackward(const int nthreads, const dtype* top_diff,
                                          const int64* mask, const int top_offset,
                                  const int bottom_offset, dtype* bottom_diff) {
  CUDA_1D_KERNEL_LOOP(index, nthreads) {
    int image_id = (index / bottom_offset);
    CudaAtomicAdd(bottom_diff, top_diff[mask[index] + image_id * top_offset]);
  }
}

#undef CUDA_1D_KERNEL_LOOP
...

I am kinda lost since I'm a beginner with cuda. Anybody has any idea what might be going wrong?

girving commented 8 years ago

It's impossible to debug this without seeing your code. As a wild guess: maybe you are running GPU kernels on Tensor objects stored on the CPU?

syed-ahmed commented 8 years ago

Hi @girving. Sorry for not posting the full code. I didn't want to lengthen this issue by posting all the code. You can review the changes in this link.

I am calling the max unpool like this:

 return gen_nn_ops._max_unpool(array_ops.shape(origin_input_tensor), grad,
                                     argmax_tensor,
                                     ksize=[1, 2, 2, 1], strides=[1,1,1,1],
                                     padding="VALID", name=name)

I am not sure if the origin_input_tensor and argmax_tensor objects are in CPU or GPU. The cuda-gdb output of MaxUnpoolForward suggests that "This occurs when any thread within a warp accesses an address that is outside the valid range of local or shared memory regions." gpu error reporting

syed-ahmed commented 8 years ago

Also there is a lot of code duplication in my changes. I can make the unpool op use the same compute kernel. I was just trying out if using the same compute kernel was causing the CUDA error in the version I posted here.

wenouyang commented 8 years ago

In the Tensorflow implementation (https://github.com/MarvinTeichmann/tensorflow-fcn/blob/master/fcn32_vgg.py) of fully convolutional model (https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf), author define a function of

``def _upscore_layer(self, bottom, shape,
                   num_classes, name, debug,
                   ksize=4, stride=2):
       strides = [1, stride, stride, 1]
        with tf.variable_scope(name):
        in_features = bottom.get_shape()[3].value

        if shape is None:
            # Compute shape out of Bottom
            in_shape = tf.shape(bottom)

            h = ((in_shape[1] - 1) * stride) + 1
            w = ((in_shape[2] - 1) * stride) + 1
            new_shape = [in_shape[0], h, w, num_classes]
        else:
            new_shape = [shape[0], shape[1], shape[2], num_classes]
        output_shape = tf.pack(new_shape)

        logging.debug("Layer: %s, Fan-in: %d" % (name, in_features))
        f_shape = [ksize, ksize, num_classes, in_features]

        # create
        num_input = ksize * ksize * in_features / stride
        stddev = (2 / num_input)**0.5

        weights = self.get_deconv_filter(f_shape)
        deconv = tf.nn.conv2d_transpose(bottom, weights, output_shape,
                                        strides=strides, padding='SAME')

        if debug:
            deconv = tf.Print(deconv, [tf.shape(deconv)],
                              message='Shape of %s' % name,
                              summarize=4, first_n=1)

    _activation_summary(deconv)
    return deconv

Looks like author just uses tf.nn.conv2d_transpose to do the upsampling. Is my understanding correct?

ziky90 commented 8 years ago

@wenouyang Yes in the FCN in https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf they use only tf.nn.conv2d_transpose() to perform the upsampling, but there exists also other models, mainly for semantic segmentation that use also max_unpooling, for example http://arxiv.org/abs/1505.04366.

girving commented 8 years ago

Sorry for the delay, taking a look at your code now.

girving commented 8 years ago

I must not understand your code. How are you doing an effectively 3D unpooling operation (batch, height, width) with a 1D loop that does only one integer division? One integer division is only powerful enough to express a 2D loop.

syed-ahmed commented 8 years ago

@girving I followed the MaxPoolBackward code in the maxpooling_op_gpu.cu.cc. I thought n-dimensions of the tensor is taken care of by the following in maxpooling_op.cc in the LaunchMaxUnpooling function I defined (like LaunchMaxPoolingGradWithArgmax):

const int input_size = params.tensor_in_batch * params.tensor_in_rows *
                           params.tensor_in_cols * params.depth;
const int output_size = params.tensor_in_batch * params.out_height *
                            params.out_width * params.depth;
const int top_offset = params.out_height * params.out_width * params.depth;
const int bottom_offset = params.tensor_in_rows * params.tensor_in_cols * params.depth;
girving commented 8 years ago

@syed-ahmed Ah, got it: the indices are already flattened, so it only needs to be 2D. Unfortunately I don't know why your code is failing; I would try to replicate the behavior with the existing routine and then add print statements until you know what differs.

syed-ahmed commented 8 years ago

@girving Thank you for your reply.

wenouyang commented 8 years ago

@ziky90, thank you for your response. Kind of related to my current question, I have some confusions on the kernel size specification related to upsampling layer implemented using tf.nn.conv2d_transpose( ). http://stats.stackexchange.com/questions/226047/kernel-size-and-stride-value-for-fully-convolutional-network-for-semantic-segmen

I noticed that you have get involved in the discussion related to fcn on stackoverflow. If you do not mind, would you like to share some thoughts on my question. Thank you very much.

andykitchen commented 8 years ago

+1

fabianbormann commented 8 years ago

I also try to implement the DeconvNet described in Learning Deconvolution Network for Semantic Segmentation and I'm very interested in a native method like tf.max_unpool_with_argmax too, but for now I want to share my python tf implementation (example):

def unravel_argmax(argmax, shape):
    output_list = []
    output_list.append(argmax // (shape[2] * shape[3]))
    output_list.append(argmax % (shape[2] * shape[3]) // shape[3])
    return tf.pack(output_list)

def unpool_layer2x2(x, argmax):
    x_shape = tf.shape(x)
    output = tf.zeros([x_shape[1] * 2, x_shape[2] * 2, x_shape[3]])

    height = tf.shape(output)[0]
    width = tf.shape(output)[1]
    channels = tf.shape(output)[2]
    # build the indices for a SparseTensor addition like http://stackoverflow.com/a/34686952/3524844
    t1 = tf.to_int64(tf.range(channels))
    t1 = tf.tile(t1, [(width // 2) * (height // 2)])
    t1 = tf.reshape(t1, [-1, channels])
    t1 = tf.transpose(t1, perm=[1, 0])
    t1 = tf.reshape(t1, [channels, height // 2, width // 2, 1])

    t2 = tf.squeeze(argmax)
    t2 = tf.pack((t2[0], t2[1]), axis=0)
    t2 = tf.transpose(t2, perm=[3, 1, 2, 0])

    t = tf.concat(3, [t2, t1])
    indices = tf.reshape(t, [(height // 2) * (width // 2) * channels, 3])
    # Get the values for max_unpooling (used in addition with argmax location)
    x1 = tf.squeeze(x)
    x1 = tf.reshape(x1, [-1, channels])
    x1 = tf.transpose(x1, perm=[1, 0])
    values = tf.reshape(x1, [-1])
    # perform addition
    delta = tf.SparseTensor(indices, values, tf.to_int64(tf.shape(output)))
    return tf.expand_dims(tf.sparse_tensor_to_dense(tf.sparse_reorder(delta)), 0)

of an unpooling using the unraveled argmax of tf.nn.max_pool_with_argmax for everybody searching for a similar method -> replace all loops with tensor transformations was a little bit tricky and maybe there is a better (more readable) way - first I tried to use nested tf.while_loop but this was very slow. My implementation assumes a batch_size == 1 but for other use cases it could be simply rewrite.

hermitman commented 8 years ago

@fabianbormann Great solution. I am implementing the deconv net, and am also stuck on this. Since I am new to Tensorflow, could you give me some hint on converting your code to something that works with any batch size? Thanks,

fabianbormann commented 8 years ago

@hermitman you would need to expand the indices so that you can access a new 4D (output = tf.zeros([x_shape[0], x_shape[1] * 2, x_shape[2] * 2, x_shape[3]])). indices is currently a tensor with coordinates [h, w, c] and values is a list with values matching to this coordinates.

You need to change the transformations, so that indices also respects [b, h, w, c] and add all corresponding batch values to the values list. I opened an issue #3 in my project and I will fix it soon. If you implement the deconv net too, it would be great if you fork my project (or I could give you write access) so that we could share some knowledge during the implementation! (The same applies for everyone else)

hermitman commented 8 years ago

@fabianbormann Thanks for the detailed explanation. For me, I am using the deconv net as a part of the reconstruction network in my project. I will try to implement your solution first, and if I figure it out, I will clean up this part and share it.

hermitman commented 8 years ago

@fabianbormann Hi, I am reading the code and referring to the previous discussion in this thread. I have this question, how do the current implementation backprop the grad through the unpooling layer? Is it taken care of by the tf.[ops] ?

hermitman commented 8 years ago

@fabianbormann @girving Can your unpooling operation backpropagate gradients? I manage to get one version work on the forward pass, but tensorflow could not backpropagate the gradients. My code currently works like this:

def unpool_layer2x2(inputs, argmax, name):

    with tf.variable_scope(name) as scope:

        x_shape = tf.shape(inputs)
        batches = x_shape[0]
        height = x_shape[1]
        width = x_shape[2]
        channels = x_shape[3]

        height_ori = height * 2
        width_ori = width*2

        argmax_offset = tf.range(batches)
        argmax_offset = tf.reshape(argmax_offset, [-1, 1, 1, 1])
        with tf.device('/cpu:0'):
          argmax_offset = tf.tile(argmax_offset, [1, height, width, channels]) * height_ori * width_ori * channels
        argmax = argmax + tf.to_int64(argmax_offset)

        list_x = tf.reshape(inputs, [batches*height*width*channels, 1])
        list_argmax = tf.reshape(argmax, [batches*height*width*channels, 1])
        list_indices_batches = list_argmax//tf.to_int64(height_ori*width_ori*channels)
        with tf.device('/cpu:0'):
            list_indices_height = list_argmax%tf.to_int64(height_ori*width_ori*channels) // tf.to_int64(width_ori*channels)
            list_indices_width = list_argmax%tf.to_int64(width_ori*channels) // tf.to_int64(channels)
            list_indices_channels = list_argmax%tf.to_int64(channels)
            list_indices = tf.concat(1, [list_indices_batches, list_indices_height, list_indices_width, list_indices_channels])
        output = tf.SparseTensor(list_indices, tf.squeeze(list_x), tf.to_int64([batches, height_ori, width_ori, channels]))
        with tf.device('/cpu:0'):
          return tf.sparse_tensor_to_dense(tf.sparse_reorder(output))

I am not familiar with how TF determine if a op is differentiable, so I do not know what I did was affecting the backprop. Could you direct me to some related readings?

NickShahML commented 8 years ago

I don't know if you're feeding in a tf.argmax to your argmax argument, but I'm pretty sure tf.argmax is non-differentiable.

hermitman commented 8 years ago

@LeavesBreathe Hi, I am not trying to backprop gradients to the argmax. I am using the argmax to create a unpooling path. The gradients will be directed according to such paths. For example,

if my bottom for this layer is 2_4_4_1 (batch_height_width_channels), then the desired output is 2_8_8*1, where in each 2x2 neighborhood, there is only one active pixel. The exact location of the pixel is determined by argmax, which comes from the maxpool_with_argmax() op.

When backprop, the top input gradient map is 2_8_8_1, and then the corresponding gradient should be directly send to the output location in the 2_4_4_1 bottom output. There is not computation in the process, but merely directing the gradients to the correct location. It is the reverse of maxpool i think.

Could you help me find out a way to implement the aforementioned op?

cjspoerer commented 7 years ago

@hermitman This is my first post, so apologies if I get it wrong, but I think that this post addresses the issues with the missing gradients https://github.com/tensorflow/tensorflow/issues/1793#issuecomment-234070576

hermitman commented 7 years ago

@cjspoerer Hi, thanks for you response. The gradient for max_pool_with_argmax can be retrieved in your mentioned tickets. However, since we are implementing the unpooling operation, which do not have an available tensor flow OP, we are struggling to get gradient out of this new op. The operation we want is the same as described in:

The unpooling that I had on my ming is described in here http://www.matthewzeiler.com/pubs/iccv2011/iccv2011.pdf and corresponding implementation in caffe can be found here: https://github.com/HyeonwooNoh/caffe/blob/master/src/caffe/layers/unpooling_layer.cpp Also some more formal description is available in the torch documentation: https://github.com/torch/nn/blob/master/doc/convolution.md#spatialmaxunpooling

EmmaBYPeng commented 7 years ago

Hi, I implemented the batch version (i.e. batch_size >= 1) of @fabianbormann 's unpool layer and it's been working well for me:


  def unravel_argmax(argmax, shape):
    output_list = [argmax // (shape[2]*shape[3]),
                   argmax % (shape[2]*shape[3]) // shape[3]]
    return tf.pack(output_list)

  def unpool_layer2x2_batch(bottom, argmax):
    bottom_shape = tf.shape(bottom)
    top_shape = [bottom_shape[0], bottom_shape[1]*2, bottom_shape[2]*2, bottom_shape[3]]

    batch_size = top_shape[0]
    height = top_shape[1]
    width = top_shape[2]
    channels = top_shape[3]

    argmax_shape = tf.to_int64([batch_size, height, width, channels])
    argmax = unravel_argmax(argmax, argmax_shape)

    t1 = tf.to_int64(tf.range(channels))
    t1 = tf.tile(t1, [batch_size*(width//2)*(height//2)])
    t1 = tf.reshape(t1, [-1, channels])
    t1 = tf.transpose(t1, perm=[1, 0])
    t1 = tf.reshape(t1, [channels, batch_size, height//2, width//2, 1])
    t1 = tf.transpose(t1, perm=[1, 0, 2, 3, 4])

    t2 = tf.to_int64(tf.range(batch_size))
    t2 = tf.tile(t2, [channels*(width//2)*(height//2)])
    t2 = tf.reshape(t2, [-1, batch_size])
    t2 = tf.transpose(t2, perm=[1, 0])
    t2 = tf.reshape(t2, [batch_size, channels, height//2, width//2, 1])

    t3 = tf.transpose(argmax, perm=[1, 4, 2, 3, 0])

    t = tf.concat(4, [t2, t3, t1])
    indices = tf.reshape(t, [(height//2)*(width//2)*channels*batch_size, 4])

    x1 = tf.transpose(bottom, perm=[0, 3, 1, 2])
    values = tf.reshape(x1, [-1])

    delta = tf.SparseTensor(indices, values, tf.to_int64(top_shape))
    return tf.sparse_tensor_to_dense(tf.sparse_reorder(delta))
hermitman commented 7 years ago

@EmmaBYPeng Hi, thanks for sharing the awesome solution. Did you try to backprop through this layer, such as train the deconvolution net with this unpooling layer? It seems that this implementation still cannot receive gradients during training.

The forward works fine, in the sense that given the downsampled image and argmax, it can generate the upsampled image with black pixel filler. However, during training, if we place the unpooling layer between conv2d_transpose ops, I found that the gradients cannot propagate through the unpooling layer.

EmmaBYPeng commented 7 years ago

@hermitman Hi, I trained my deconv net with this layer and I believe the gradients are back propagated correctly (@fabianbormann 's original version also worked for me).

hermitman commented 7 years ago

@EmmaBYPeng That is awesome! Could you share your deconv network structure? I have stuck on this problem for a while. If it works for you, then the problem should be somewhere in my deconv network structure. I basically have something like this:

conv2d_transpose(...)
unpool2x2
conv2d_transpose(...)
unpool2x2
...

if I opt.compute_gradient(), the gradients after the first unpooling layer are all None. If the unpooling is fine, I wonder what caused my problems. = =!

May I take a look at your deconv structure, so I can figure out the difference in implementations?

hazirbas commented 7 years ago

@EmmaBYPeng It is really great that you have shared the code with us. I just realized that its performance (runtime) is slow. Do you have any improvements in the performance yet? Thanks.

djl11 commented 7 years ago

With regards to performance, as far as I can tell, the main bottleneck is a result of the sparse tensor re-order and sparse-to-dense operations, which are both performed on the CPU it seems. Here is the resulting CUPTI GPU trace for a single training step using the max_unpool method. The gray and purple are the reorder and sparse-to-dense operations respectively. chrome___tracing.pdf