Open zy-xc opened 3 years ago
Hi @zy-xc
Thank you for your interest in our work. Here is the link for the corresponding supplement: https://drive.google.com/file/d/1sBFXqWaWOeMuaaVHMM-ddBssKr3OmutW/view?usp=sharing
Please feel free to contact me if there is any other question. Thank you!
Best, Yongcheng
Thank you for your reply!
I am a bit confusing about size of weight generated by Weight/Bias network. Is the dynamic convolution layer set groups = 64(num_channels of content feature) ?
It seens that the size of style image should be large if we set groups=1. For example, considering standard DIN with kernel_size=1. The weight size generated by Weight Net should be 64 64 1 1. So the vgg features size of style image should be at least 64 64 64(C H W), and the size of style image should be at least 512 512. Then if we want to train standard DIN with kernel_size=3, the size of style image should be at least 1536 * 1536.
Or standard DIN set groups=64 and the size of generated weight should be 64 kernel_size kernel_size ?
Thank you!
Hi @zy-xc
Thank you for your interests in our work! Regarding your question, yes, we indeed set group # to be equal to the feature channel, which is indicated in the "Architecture Details" in the supplement. Also, please kindly note that the size of the generated weight and bias is not correlated with the input size, since we use an adaptive pooling layer in the corresponding weight and bias networks. You can set the desired size of the weight and bias by controlling the adaptive pooling layer.
Please let me know if there is any other question. Thank you.
Best, Yongcheng
I find the supplementary detail confusing to implement. Has anyone implemented in Pytorch yet? Can you help me? Thank you so much
Hi @sonnguyen129
Thank you for your interests in our work! Could you please elaborate which part exactly is confusing? I am more than happy to clarify it. Also, if you would like our source code, please drop me an email to apply for the necessary permission that is required by the company. Thanks!
Best, Yongcheng
Hi @ycjing I sent you an email. I hope to hear from you as soon as possible. Thank you.
Hi @ycjing I have a few questions as follows:
2.Res layer and upsampling layer is quite lacking in information and I don't know where it is on the illustration
Hi @sonnguyen129
Thanks for your interests again! Please feel free to reach me if there is anything else that is not clear.
Cheers, Yongcheng
Hi @sonnguyen129
Could you please provide the detailed log information? Thanks!
Best,
Here is my test case:
c = torch.rand(8,64,224,224)
s = torch.rand(8,64,224,224)
out = DIN(3)(c, s)
print(out)
Logs:
Traceback (most recent call last):
File "model.py", line 136, in <module>
out = DIN(3)(c, s)
File "model.py", line 70, in __init__
self.weight_bias = WeightAndBias(inp = inp)
File "model.py", line 49, in __init__
self.dwconv1 = DepthWiseConv2d(inp, 128, 3, 128, 2)
File "model.py", line 10, in __init__
groups = groups, stride = stride, padding = 1)
File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 432, in __init__
False, _pair(0), groups, bias, padding_mode, **factory_kwargs)
File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 84, in __init__
raise ValueError('in_channels must be divisible by groups')
ValueError: in_channels must be divisible by groups
Hi @sonnguyen129
As depicted in the log, the group # is wrong, which should be equal to in_channel.
Best, Yongcheng
Hi @ycjing I have 2 questions:
add
method in Fig 4 a concat channel or just like basic residual block?
Thank you so much.Hi @ycjing I got error.
Traceback (most recent call last):
File "model.py", line 197, in <module>
out = WeightAndBias(512)(out)
File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "model.py", line 79, in forward
out = self.dwconv2(out)
File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "model.py", line 25, in forward
out = self.pointwise(out)
File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/instancenorm.py", line 59, in forward
self.training or not self.track_running_stats, self.momentum, self.eps)
File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2325, in instance_norm
_verify_spatial_size(input.size())
File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2292, in _verify_spatial_size
raise ValueError("Expected more than 1 spatial element when training, got input size {}".format(size))
ValueError: Expected more than 1 spatial element when training, got input size torch.Size([8, 64, 1, 1])
Here is my code:
class DepthWiseConv2d(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, groups, stride):
super(DepthWiseConv2d, self).__init__()
self.depthwise = nn.Sequential(
nn.Conv2d(in_channels, in_channels, kernel_size = kernel_size,
groups = groups, stride = stride, padding = 1),
nn.InstanceNorm2d(in_channels),
nn.ReLU(True)
)
self.pointwise = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size = kernel_size,
stride = stride),
nn.InstanceNorm2d(out_channels),
nn.ReLU(True)
)
def forward(self, x):
out = self.depthwise(x)
out = self.pointwise(out)
return out
class VGGEncoder(nn.Module):
def __init__(self):
super().__init__()
vgg = vgg19(pretrained=True).features
self.slice1 = vgg[: 2]
self.slice2 = vgg[2: 7]
self.slice3 = vgg[7: 12]
self.slice4 = vgg[12: 21]
for p in self.parameters():
p.requires_grad = False
def forward(self, images, output_last_feature=False):
h1 = self.slice1(images)
h2 = self.slice2(h1)
h3 = self.slice3(h2)
h4 = self.slice4(h3)
if output_last_feature:
return h4
else:
return h1, h2, h3, h4
class WeightAndBias(nn.Module):
"""Weight/Bias Network"""
def __init__(self, in_channels = 512):
super(WeightAndBias,self).__init__()
self.dwconv1 = DepthWiseConv2d(in_channels, 128, 3, 128, 2)
self.dwconv2 = DepthWiseConv2d(128, 64, 3, 64, 2)
# self.adapool1 = nn.AdaptiveMaxPool2d()
self.dwconv3 = DepthWiseConv2d(64, 64, 3, 64, 2)
# self.adapool2 = nn.AdaptiveMaxPool2d()
def forward(self, x):
out = self.dwconv1(x)
out = self.dwconv2(out)
print(out.shape)
# out = self.adapool1(out)
out = self.dwconv3(out)
# out = self.adapool2(out)
return out
#test case
s = torch.rand(8,3,256,256)
out = VGGEncoder()(s, True)
out = WeightAndBias(512)(out)
print(out.shape)
Hope you help me. Thank you so much.
Hi @ycjing I have 2 questions:
- Can you provide information about the AdaptivePooling layer, specifically the target size.
- Is
add
method in Fig 4 a concat channel or just like basic residual block? Thank you so much.
Our adaptive pooling layer is defined as follows:
nn.AdaptiveAvgPool2d((1,1))
Please be noted that the 'add' operation is not part of the residual blocks. It simply adds the output feature maps from the first few layers and the last few layers.
Hi @ycjing I got error.
Traceback (most recent call last): File "model.py", line 197, in <module> out = WeightAndBias(512)(out) File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "model.py", line 79, in forward out = self.dwconv2(out) File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "model.py", line 25, in forward out = self.pointwise(out) File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/instancenorm.py", line 59, in forward self.training or not self.track_running_stats, self.momentum, self.eps) File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2325, in instance_norm _verify_spatial_size(input.size()) File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2292, in _verify_spatial_size raise ValueError("Expected more than 1 spatial element when training, got input size {}".format(size)) ValueError: Expected more than 1 spatial element when training, got input size torch.Size([8, 64, 1, 1])
Here is my code:
class DepthWiseConv2d(nn.Module): def __init__(self, in_channels, out_channels, kernel_size, groups, stride): super(DepthWiseConv2d, self).__init__() self.depthwise = nn.Sequential( nn.Conv2d(in_channels, in_channels, kernel_size = kernel_size, groups = groups, stride = stride, padding = 1), nn.InstanceNorm2d(in_channels), nn.ReLU(True) ) self.pointwise = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size = kernel_size, stride = stride), nn.InstanceNorm2d(out_channels), nn.ReLU(True) ) def forward(self, x): out = self.depthwise(x) out = self.pointwise(out) return out class VGGEncoder(nn.Module): def __init__(self): super().__init__() vgg = vgg19(pretrained=True).features self.slice1 = vgg[: 2] self.slice2 = vgg[2: 7] self.slice3 = vgg[7: 12] self.slice4 = vgg[12: 21] for p in self.parameters(): p.requires_grad = False def forward(self, images, output_last_feature=False): h1 = self.slice1(images) h2 = self.slice2(h1) h3 = self.slice3(h2) h4 = self.slice4(h3) if output_last_feature: return h4 else: return h1, h2, h3, h4 class WeightAndBias(nn.Module): """Weight/Bias Network""" def __init__(self, in_channels = 512): super(WeightAndBias,self).__init__() self.dwconv1 = DepthWiseConv2d(in_channels, 128, 3, 128, 2) self.dwconv2 = DepthWiseConv2d(128, 64, 3, 64, 2) # self.adapool1 = nn.AdaptiveMaxPool2d() self.dwconv3 = DepthWiseConv2d(64, 64, 3, 64, 2) # self.adapool2 = nn.AdaptiveMaxPool2d() def forward(self, x): out = self.dwconv1(x) out = self.dwconv2(out) print(out.shape) # out = self.adapool1(out) out = self.dwconv3(out) # out = self.adapool2(out) return out #test case s = torch.rand(8,3,256,256) out = VGGEncoder()(s, True) out = WeightAndBias(512)(out) print(out.shape)
Hope you help me. Thank you so much.
Hi @sonnguyen129
Please refer to my previous reply and be careful about the output dimensions.
Best,
Hi @ycjing Thanks for your reply, thanks to that I fixed the error. Despite reading the paper quite carefully, I still don't understand how Weight/Bias Network generates weight and bias. How to get that weight and bias in Pytorch? Thank you so much
Hi @sonnguyen129
Thank you for your interests. From your code, I think you have already got the point, i.e., dynamically predicting the weight and bias via the weight and bias networks. Could you please further elaborate your question? Thanks!
Best,
Hi @ycjing Sorry for my unclear question. As I understand it, the style image after encoded by VGG will go through the weight and bias network. Do the generated weight and bias are the weights and biases of the last conv layer of the weight/bias network?(In my code in dwconv3). Thank you so much.
Hi @sonnguyen129
No problem. The weight and bias are, actually, the output of the corresponding weight/bias networks, which is somewhat similar to the dynamic filter network (https://arxiv.org/abs/1605.09673).
Cheers, Yongcheng
Hi @ycjing I already read dynamic filter network. However, if the weight and bias are both outputs of the network, the values will be the same, right? But when reading about dynamic convolution in Pytorch, the weight and bias should be different. I hope you answer. Thank you so much.
Hi @sonnguyen129
Thank you for your interest. The values are, in fact, not the same. As demonstrated in the figure and explained in the paper, we use a separate weight net and bias net to produce the corresponding weight and bias.
Best, Yongcheng
Hello @ycjing Thanks for your brilliant works! I am interesting in paper "Dynamic Instance Normalization for Arbitrary Style Transfer" but I don't know the detail architecture of DIN and can't find the supplementary material. Would you please provide the detailed network architecture of this paper? Thank you!