Closed Mathilda88 closed 3 years ago
Thanks for your attention. For Q1, actually we do get convolved step by step : unfold-->element-wise multiplication-->sum For Q2, I think the main problem has been answered by A1. "9 additional channels" is the kernel size^2, which is not related to spatial content.
Thank you so much for your response. Actually, I didn't understand what does the unfolding do here for us? You mean it is the same as getting a copy from a feature map for 9 times and then storing them as a new dimension?
I know that it extracts a rolling blocks from the spatial dimension but here I can't imagine what it looks like in practice. May you please a bit elaborate on it.
def unfold(input, kernel_size, dilation=1, padding=0, stride=1):
r"""Extracts sliding local blocks from a batched input tensor.
.. warning::
Currently, only 4-D input tensors (batched image-like tensors) are
supported.
.. warning::
More than one element of the unfolded tensor may refer to a single
memory location. As a result, in-place operations (especially ones that
are vectorized) may result in incorrect behavior. If you need to write
to the tensor, please clone it first.
See :class:`torch.nn.Unfold` for details
"""
Hope it is useful for you
Thanks Stanly, but I was looking for what by this function you were looking into!!
In other words, If this unfolding is considered whenever we want to implement a convolution between two feature maps?
it just for the AGD module.
Hi,
Thanks for the great work. Actually, in Fig. 2 of the paper it is written that "" stands for convolution. For example I_{r-->r}^{i}f_{r} in Eq. (8) means these two maps get convolved together. However, in code you just use an element-wise multiplication between these two feature maps.
My second question is about unfolding. It seems that after unfolding the input variable (https://github.com/ygjwd12345/TransDepth/blob/0a7422c6d816429b9f3fc4cca19d93de8cd1ab8a/pytorch/AttentionGraphCondKernel.py#L101), we get an output with the same spatial size but 9 additional channels, in addition to the previous channels we are already provided. I was just wondering if the spatial content was preserved by this type of unforlding, I mean if we sample the top right corner of the spatial maps, whether all the channels are from the same spatial location in the original map.
Thanks,