Detail architecture of dynamic instance normalization

zy-xc commented 3 years ago

Hello @ycjing Thanks for your brilliant works! I am interesting in paper "Dynamic Instance Normalization for Arbitrary Style Transfer" but I don't know the detail architecture of DIN and can't find the supplementary material. Would you please provide the detailed network architecture of this paper? Thank you!

ycjing commented 3 years ago

Hi @zy-xc

Thank you for your interest in our work. Here is the link for the corresponding supplement: https://drive.google.com/file/d/1sBFXqWaWOeMuaaVHMM-ddBssKr3OmutW/view?usp=sharing

Please feel free to contact me if there is any other question. Thank you!

Best, Yongcheng

zy-xc commented 3 years ago

Thank you for your reply!

I am a bit confusing about size of weight generated by Weight/Bias network. Is the dynamic convolution layer set groups = 64(num_channels of content feature) ?

It seens that the size of style image should be large if we set groups=1. For example, considering standard DIN with kernel_size=1. The weight size generated by Weight Net should be 64 64 1 1. So the vgg features size of style image should be at least 64 64 64(C H W), and the size of style image should be at least 512 512. Then if we want to train standard DIN with kernel_size=3, the size of style image should be at least 1536 * 1536.

Or standard DIN set groups=64 and the size of generated weight should be 64 kernel_size kernel_size ?

Thank you!

ycjing commented 3 years ago

Hi @zy-xc

Thank you for your interests in our work! Regarding your question, yes, we indeed set group # to be equal to the feature channel, which is indicated in the "Architecture Details" in the supplement. Also, please kindly note that the size of the generated weight and bias is not correlated with the input size, since we use an adaptive pooling layer in the corresponding weight and bias networks. You can set the desired size of the weight and bias by controlling the adaptive pooling layer.

Please let me know if there is any other question. Thank you.

Best, Yongcheng

sonnguyen129 commented 2 years ago

I find the supplementary detail confusing to implement. Has anyone implemented in Pytorch yet? Can you help me? Thank you so much

ycjing commented 2 years ago

Hi @sonnguyen129

Thank you for your interests in our work! Could you please elaborate which part exactly is confusing? I am more than happy to clarify it. Also, if you would like our source code, please drop me an email to apply for the necessary permission that is required by the company. Thanks!

Best, Yongcheng

sonnguyen129 commented 2 years ago

Hi @ycjing I sent you an email. I hope to hear from you as soon as possible. Thank you.

sonnguyen129 commented 2 years ago

Hi @ycjing I have a few questions as follows:

as I understand it, that's the correlation between proposed architecture and illustration. Am I correct? (Sorry for the bad drawing)

2.Res layer and upsampling layer is quite lacking in information and I don't know where it is on the illustration

With DIN module, when training, input is each image style separately or in batch. If batch, does the bactch size need to match the content dataset?
Shilei Wen's mail on paper(wenshilei@baidu.com) is currently incorrect I hope to be of your help. Thanks very much.

ycjing commented 2 years ago

Hi @sonnguyen129

Yes.
Since it is quite redundant to show the residual connections in the figure, I just use the blocks to represent the corresponding residual modules. Our used residual blocks have no differences with those used in other tasks, just the most common ones.
We follow the settings in AdaIN. Please refer to https://github.com/naoto0804/pytorch-AdaIN
As I already mentioned in my email, you can alternatively contact Dr. Errui Ding. Other information is also already provided in the mail.

Thanks for your interests again! Please feel free to reach me if there is anything else that is not clear.

Cheers, Yongcheng

ycjing commented 2 years ago

Hi @sonnguyen129

Could you please provide the detailed log information? Thanks!

Best,

sonnguyen129 commented 2 years ago

Here is my test case:

c = torch.rand(8,64,224,224)
s = torch.rand(8,64,224,224)
out = DIN(3)(c, s)
print(out)

Logs:

Traceback (most recent call last):
  File "model.py", line 136, in <module>
    out = DIN(3)(c, s)
  File "model.py", line 70, in __init__
    self.weight_bias = WeightAndBias(inp = inp)
  File "model.py", line 49, in __init__
    self.dwconv1 = DepthWiseConv2d(inp, 128, 3, 128, 2)
  File "model.py", line 10, in __init__
    groups = groups, stride = stride, padding = 1)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 432, in __init__
    False, _pair(0), groups, bias, padding_mode, **factory_kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 84, in __init__
    raise ValueError('in_channels must be divisible by groups')
ValueError: in_channels must be divisible by groups

ycjing commented 2 years ago

Hi @sonnguyen129

As depicted in the log, the group # is wrong, which should be equal to in_channel.

Best, Yongcheng

sonnguyen129 commented 2 years ago

Hi @ycjing I have 2 questions:

Can you provide information about the AdaptivePooling layer, specifically the target size.
Is add method in Fig 4 a concat channel or just like basic residual block? Thank you so much.

sonnguyen129 commented 2 years ago

Hi @ycjing I got error.

Traceback (most recent call last):
  File "model.py", line 197, in <module>
    out = WeightAndBias(512)(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "model.py", line 79, in forward
    out = self.dwconv2(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "model.py", line 25, in forward
    out = self.pointwise(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/instancenorm.py", line 59, in forward
    self.training or not self.track_running_stats, self.momentum, self.eps)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2325, in instance_norm
    _verify_spatial_size(input.size())
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2292, in _verify_spatial_size
    raise ValueError("Expected more than 1 spatial element when training, got input size {}".format(size))
ValueError: Expected more than 1 spatial element when training, got input size torch.Size([8, 64, 1, 1])

Here is my code:

class DepthWiseConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, groups, stride):
        super(DepthWiseConv2d, self).__init__()
        self.depthwise = nn.Sequential(
                nn.Conv2d(in_channels, in_channels, kernel_size = kernel_size,
                    groups = groups, stride = stride, padding = 1),
                nn.InstanceNorm2d(in_channels),
                nn.ReLU(True)
        )
        self.pointwise = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size = kernel_size,
                    stride = stride),
                nn.InstanceNorm2d(out_channels),
                nn.ReLU(True)
        )

    def forward(self, x):
        out = self.depthwise(x)
        out = self.pointwise(out)
        return out

class VGGEncoder(nn.Module):
    def __init__(self):
        super().__init__()
        vgg = vgg19(pretrained=True).features
        self.slice1 = vgg[: 2]
        self.slice2 = vgg[2: 7]
        self.slice3 = vgg[7: 12]
        self.slice4 = vgg[12: 21]
        for p in self.parameters():
            p.requires_grad = False

    def forward(self, images, output_last_feature=False):
        h1 = self.slice1(images)
        h2 = self.slice2(h1)
        h3 = self.slice3(h2)
        h4 = self.slice4(h3)
        if output_last_feature:
            return h4
        else:
            return h1, h2, h3, h4

class WeightAndBias(nn.Module):
    """Weight/Bias Network"""

    def __init__(self, in_channels = 512):
        super(WeightAndBias,self).__init__()
        self.dwconv1 = DepthWiseConv2d(in_channels, 128, 3, 128, 2)
        self.dwconv2 = DepthWiseConv2d(128, 64, 3, 64, 2)
        # self.adapool1 = nn.AdaptiveMaxPool2d()
        self.dwconv3 = DepthWiseConv2d(64, 64, 3, 64, 2)
        # self.adapool2 = nn.AdaptiveMaxPool2d()

    def forward(self, x):
        out = self.dwconv1(x)
        out = self.dwconv2(out)
        print(out.shape)
        # out = self.adapool1(out)
        out = self.dwconv3(out)
        # out = self.adapool2(out)
        return out
#test case
s = torch.rand(8,3,256,256)
out = VGGEncoder()(s, True)
out = WeightAndBias(512)(out)
print(out.shape)

Hope you help me. Thank you so much.

ycjing commented 2 years ago

Hi @ycjing I have 2 questions:

Can you provide information about the AdaptivePooling layer, specifically the target size.

Is add method in Fig 4 a concat channel or just like basic residual block? Thank you so much.

Our adaptive pooling layer is defined as follows: nn.AdaptiveAvgPool2d((1,1))
Please be noted that the 'add' operation is not part of the residual blocks. It simply adds the output feature maps from the first few layers and the last few layers.

ycjing commented 2 years ago

Hi @ycjing I got error.

Traceback (most recent call last):
  File "model.py", line 197, in <module>
    out = WeightAndBias(512)(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "model.py", line 79, in forward
    out = self.dwconv2(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "model.py", line 25, in forward
    out = self.pointwise(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/instancenorm.py", line 59, in forward
    self.training or not self.track_running_stats, self.momentum, self.eps)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2325, in instance_norm
    _verify_spatial_size(input.size())
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2292, in _verify_spatial_size
    raise ValueError("Expected more than 1 spatial element when training, got input size {}".format(size))
ValueError: Expected more than 1 spatial element when training, got input size torch.Size([8, 64, 1, 1])

Here is my code:

class DepthWiseConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, groups, stride):
        super(DepthWiseConv2d, self).__init__()
        self.depthwise = nn.Sequential(
                nn.Conv2d(in_channels, in_channels, kernel_size = kernel_size,
                    groups = groups, stride = stride, padding = 1),
                nn.InstanceNorm2d(in_channels),
                nn.ReLU(True)
        )
        self.pointwise = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size = kernel_size,
                    stride = stride),
                nn.InstanceNorm2d(out_channels),
                nn.ReLU(True)
        )

    def forward(self, x):
        out = self.depthwise(x)
        out = self.pointwise(out)
        return out

class VGGEncoder(nn.Module):
    def __init__(self):
        super().__init__()
        vgg = vgg19(pretrained=True).features
        self.slice1 = vgg[: 2]
        self.slice2 = vgg[2: 7]
        self.slice3 = vgg[7: 12]
        self.slice4 = vgg[12: 21]
        for p in self.parameters():
            p.requires_grad = False

    def forward(self, images, output_last_feature=False):
        h1 = self.slice1(images)
        h2 = self.slice2(h1)
        h3 = self.slice3(h2)
        h4 = self.slice4(h3)
        if output_last_feature:
            return h4
        else:
            return h1, h2, h3, h4

class WeightAndBias(nn.Module):
    """Weight/Bias Network"""

    def __init__(self, in_channels = 512):
        super(WeightAndBias,self).__init__()
        self.dwconv1 = DepthWiseConv2d(in_channels, 128, 3, 128, 2)
        self.dwconv2 = DepthWiseConv2d(128, 64, 3, 64, 2)
        # self.adapool1 = nn.AdaptiveMaxPool2d()
        self.dwconv3 = DepthWiseConv2d(64, 64, 3, 64, 2)
        # self.adapool2 = nn.AdaptiveMaxPool2d()

    def forward(self, x):
        out = self.dwconv1(x)
        out = self.dwconv2(out)
        print(out.shape)
        # out = self.adapool1(out)
        out = self.dwconv3(out)
        # out = self.adapool2(out)
        return out
#test case
s = torch.rand(8,3,256,256)
out = VGGEncoder()(s, True)
out = WeightAndBias(512)(out)
print(out.shape)

Hope you help me. Thank you so much.

Hi @sonnguyen129

Please refer to my previous reply and be careful about the output dimensions.

Best,

sonnguyen129 commented 2 years ago

Hi @ycjing Thanks for your reply, thanks to that I fixed the error. Despite reading the paper quite carefully, I still don't understand how Weight/Bias Network generates weight and bias. How to get that weight and bias in Pytorch? Thank you so much

ycjing commented 2 years ago

Hi @sonnguyen129

Thank you for your interests. From your code, I think you have already got the point, i.e., dynamically predicting the weight and bias via the weight and bias networks. Could you please further elaborate your question? Thanks!

Best,

sonnguyen129 commented 2 years ago

Hi @ycjing Sorry for my unclear question. As I understand it, the style image after encoded by VGG will go through the weight and bias network. Do the generated weight and bias are the weights and biases of the last conv layer of the weight/bias network?(In my code in dwconv3). Thank you so much.

ycjing commented 2 years ago

Hi @sonnguyen129

No problem. The weight and bias are, actually, the output of the corresponding weight/bias networks, which is somewhat similar to the dynamic filter network (https://arxiv.org/abs/1605.09673).

Cheers, Yongcheng

sonnguyen129 commented 2 years ago

Hi @ycjing I already read dynamic filter network. However, if the weight and bias are both outputs of the network, the values will be the same, right? But when reading about dynamic convolution in Pytorch, the weight and bias should be different. I hope you answer. Thank you so much.

ycjing commented 2 years ago

Hi @sonnguyen129

Thank you for your interest. The values are, in fact, not the same. As demonstrated in the figure and explained in the paper, we use a separate weight net and bias net to produce the corresponding weight and bias.

Best, Yongcheng

ycjing / Neural-Style-Transfer-Papers

Detail architecture of dynamic instance normalization #18