Open michaelshiyu opened 4 years ago
Yes, G is always exactly equivalent to a convolution with a single kernel. Since there are no non-linear activations, the sequence of convolutions (which are LINEAR!!!) can be replaced with a single kernel - similar to x 2 3 6 = x 36 To verify, simply initialize G with random weight, pass an image through the network and compare it to OUR resize function (or Matlab's) with the computed kernel. You should get the exact same result
Thanks a lot for answering!
I agree that a sequence of linear convolution layers can be collapsed into a single one, but is there a simple way to see that this filter is exactly the one we obtain by running calc_curr_k
?
Also, I tried testing this numerically. Here's a minimal script replicating what I did:
import os
import numpy as np
from easydict import EasyDict as edict
import torch
os.chdir('/path/to/KernelGAN/')
import loss
import networks
import torch.nn.functional as F
from util import save_final_kernel, run_zssr, post_process_k, kernel_shift
from imresize import imresize
# set config
d = edict()
d.input_crop_size = 64
d.scale_factor = .5
d.G_chan = 64
d.G_kernel_size = 13
d.D_chan = 64
d.D_n_layers = 7
d.D_kernel_size = 7
d.G_structure = [7, 5, 3, 1, 1, 1]
# re-define a simpler version of KernelGAN class
class KernelGAN:
def __init__(self, conf):
# Acquire configuration
self.conf = conf
# Define the GAN
self.G = networks.Generator(conf)
# self.D = networks.Discriminator(conf)
# The kernel G is imitating
self.curr_k = torch.FloatTensor(conf.G_kernel_size, conf.G_kernel_size)
def calc_curr_k(self):
"""given a generator network, the function calculates the kernel it is imitating"""
delta = torch.Tensor([1.]).unsqueeze(0).unsqueeze(-1).unsqueeze(-1)
for ind, w in enumerate(self.G.parameters()):
curr_k = F.conv2d(delta, w, padding=self.conf.G_kernel_size - 1) if ind == 0 else F.conv2d(curr_k, w)
self.curr_k = curr_k.squeeze().flip([0, 1])
# init network, get kernel
net = KernelGAN(d)
net.calc_curr_k()
# get a random image patch
img = np.random.rand(64, 64, 3).astype(np.float32)
# get rescaled images w/ downscaling factor 2
# k = kernel_shift(net.curr_k.detach().float().numpy(), 2)
k = net.curr_k.detach().float().numpy()
net_out = net.G(torch.from_numpy(img).view(1, 3, 64, 64)).detach()
k_out = imresize(img, .5, kernel=k).transpose((2, 0, 1))
# compare shapes
print(net_out.size()) # torch.Size([1, 3, 26, 26])
print(k_out.shape) # (3, 32, 32)
# compare the middle two rows of the output images
print(net_out[0][0][12:14])
print(k_out[0][15:17])
And the outputs are
tensor([[-0.0034, -0.0152, -0.0061, -0.0219, -0.0208, -0.0114, -0.0156, -0.0264,
-0.0260, -0.0195, -0.0092, -0.0163, -0.0102, -0.0141, -0.0291, -0.0120,
-0.0058, -0.0234, -0.0333, -0.0195, -0.0151, -0.0302, -0.0316, -0.0204,
-0.0284, 0.0142],
[-0.0317, -0.0024, -0.0353, -0.0156, -0.0192, -0.0156, -0.0207, -0.0327,
-0.0191, -0.0184, -0.0149, -0.0182, -0.0242, -0.0179, -0.0219, -0.0193,
-0.0055, -0.0402, -0.0166, -0.0137, -0.0180, -0.0203, 0.0003, 0.0034,
-0.0219, -0.0291]])
[[-0.01040298 -0.02667889 -0.01370732 -0.0288636 -0.02506967 -0.01364688
-0.00924972 -0.00425742 -0.01827828 -0.00914193 -0.02248633 -0.00315671
-0.00693536 -0.01682552 -0.02707781 -0.02352768 -0.0201562 -0.01160016
-0.01983571 -0.01985664 -0.022903 -0.03139445 -0.0211829 -0.00792672
-0.00433569 -0.00781635 -0.01967012 -0.01601905 -0.01832438 -0.02570574
-0.02028925 -0.0125796 ]
[-0.01464794 -0.01586494 -0.02722405 -0.01068424 -0.02654948 -0.01590517
-0.04187544 0.00326075 -0.00151808 -0.02657411 -0.02184494 -0.01424225
-0.00618783 -0.02599089 -0.02297904 -0.01945422 -0.03042439 -0.01876058
-0.02300953 -0.01963425 -0.00840907 -0.00931074 -0.01729889 -0.00336883
-0.01440977 -0.02402707 -0.00701743 0.00232169 -0.01451722 -0.02746969
-0.01549856 -0.02331167]]
As you can see, they don't seem to agree. And they do not agree with each other if I used the shifted kernel obtained via kernel_shift
either. Surely, I've made mistake(s) somewhere in this script? Could you kindly point that out please?
Thanks again and I appreciate your help!
Thanks a lot for answering!
I agree that a sequence of linear convolution layers can be collapsed into a single one, but is there a simple way to see that this filter is exactly the one we obtain by running
calc_curr_k
?Also, I tried testing this numerically. Here's a minimal script replicating what I did:
import os import numpy as np from easydict import EasyDict as edict import torch os.chdir('/path/to/KernelGAN/') import loss import networks import torch.nn.functional as F from util import save_final_kernel, run_zssr, post_process_k, kernel_shift from imresize import imresize # set config d = edict() d.input_crop_size = 64 d.scale_factor = .5 d.G_chan = 64 d.G_kernel_size = 13 d.D_chan = 64 d.D_n_layers = 7 d.D_kernel_size = 7 d.G_structure = [7, 5, 3, 1, 1, 1] # re-define a simpler version of KernelGAN class class KernelGAN: def __init__(self, conf): # Acquire configuration self.conf = conf # Define the GAN self.G = networks.Generator(conf) # self.D = networks.Discriminator(conf) # The kernel G is imitating self.curr_k = torch.FloatTensor(conf.G_kernel_size, conf.G_kernel_size) def calc_curr_k(self): """given a generator network, the function calculates the kernel it is imitating""" delta = torch.Tensor([1.]).unsqueeze(0).unsqueeze(-1).unsqueeze(-1) for ind, w in enumerate(self.G.parameters()): curr_k = F.conv2d(delta, w, padding=self.conf.G_kernel_size - 1) if ind == 0 else F.conv2d(curr_k, w) self.curr_k = curr_k.squeeze().flip([0, 1]) # init network, get kernel net = KernelGAN(d) net.calc_curr_k() # get a random image patch img = np.random.rand(64, 64, 3).astype(np.float32) # get rescaled images w/ downscaling factor 2 # k = kernel_shift(net.curr_k.detach().float().numpy(), 2) k = net.curr_k.detach().float().numpy() net_out = net.G(torch.from_numpy(img).view(1, 3, 64, 64)).detach() k_out = imresize(img, .5, kernel=k).transpose((2, 0, 1)) # compare shapes print(net_out.size()) # torch.Size([1, 3, 26, 26]) print(k_out.shape) # (3, 32, 32) # compare the middle two rows of the output images print(net_out[0][0][12:14]) print(k_out[0][15:17])
And the outputs are
tensor([[-0.0034, -0.0152, -0.0061, -0.0219, -0.0208, -0.0114, -0.0156, -0.0264, -0.0260, -0.0195, -0.0092, -0.0163, -0.0102, -0.0141, -0.0291, -0.0120, -0.0058, -0.0234, -0.0333, -0.0195, -0.0151, -0.0302, -0.0316, -0.0204, -0.0284, 0.0142], [-0.0317, -0.0024, -0.0353, -0.0156, -0.0192, -0.0156, -0.0207, -0.0327, -0.0191, -0.0184, -0.0149, -0.0182, -0.0242, -0.0179, -0.0219, -0.0193, -0.0055, -0.0402, -0.0166, -0.0137, -0.0180, -0.0203, 0.0003, 0.0034, -0.0219, -0.0291]]) [[-0.01040298 -0.02667889 -0.01370732 -0.0288636 -0.02506967 -0.01364688 -0.00924972 -0.00425742 -0.01827828 -0.00914193 -0.02248633 -0.00315671 -0.00693536 -0.01682552 -0.02707781 -0.02352768 -0.0201562 -0.01160016 -0.01983571 -0.01985664 -0.022903 -0.03139445 -0.0211829 -0.00792672 -0.00433569 -0.00781635 -0.01967012 -0.01601905 -0.01832438 -0.02570574 -0.02028925 -0.0125796 ] [-0.01464794 -0.01586494 -0.02722405 -0.01068424 -0.02654948 -0.01590517 -0.04187544 0.00326075 -0.00151808 -0.02657411 -0.02184494 -0.01424225 -0.00618783 -0.02599089 -0.02297904 -0.01945422 -0.03042439 -0.01876058 -0.02300953 -0.01963425 -0.00840907 -0.00931074 -0.01729889 -0.00336883 -0.01440977 -0.02402707 -0.00701743 0.00232169 -0.01451722 -0.02746969 -0.01549856 -0.02331167]]
As you can see, they don't seem to agree. And they do not agree with each other if I used the shifted kernel obtained via
kernel_shift
either. Surely, I've made mistake(s) somewhere in this script? Could you kindly point that out please?Thanks again and I appreciate your help!
post_process_k function
@jnoylinc post_process_k
does two things. It first zeros out negligible values in the extracted kernel, which, if anything, only results in the kernel further deviating from the generator. Second, it calls the kernel_shift
function on the extracted kernel, which I tried (commented out in the above snippet) but still did not make the kernel produce identical output as the network.
@jnoylinc
post_process_k
does two things. It first zeros out negligible values in the extracted kernel, which, if anything, only results in the kernel further deviating from the generator. Second, it calls thekernel_shift
function on the extracted kernel, which I tried (commented out in the above snippet) but still did not make the kernel produce identical output as the network.
Yes, I also met this question, have you solved it?
@jnoylinc
post_process_k
does two things. It first zeros out negligible values in the extracted kernel, which, if anything, only results in the kernel further deviating from the generator. Second, it calls thekernel_shift
function on the extracted kernel, which I tried (commented out in the above snippet) but still did not make the kernel produce identical output as the network.Yes, I also met this question, have you solved it?
No,it is not important for me
Hi @1214635079, I haven't. I might be wrong here but I do not think the extracted kernel is the same as the learned generator.
Hi @sefibk, have you got a chance to look into this? Thanks!
I am sorry but I don't have the time to thoroughly check it. Your script seems right. I verified correctness when the model was developed, but since it was a while ago, I don't recall if there was something special about it.
Hi @1214635079, I haven't. I might be wrong here but I do not think the extracted kernel is the same as the learned generator.
Hi @sefibk, have you got a chance to look into this? Thanks!
I have check the code, the auther is right
Great news @jnoylinc. thanks!
Hi @1214635079, I haven't. I might be wrong here but I do not think the extracted kernel is the same as the learned generator. Hi @sefibk, have you got a chance to look into this? Thanks!
I have check the code, the auther is right
Nice! I believe G is always equivalent to a convolution with a single kernel since there are all linear layers. But can you tell me what the problem is in @michaelshiyu 's codes? Or can you attach your code here? Thanks!
@1214635079 I agree that a sequence of conv layers can be collapsed into a single conv layer if there are no nonlinearities involved. My concern is that this single conv kernel is not the same as the one computed using the given code, as demonstrated in my script above.
@jnoylinc Please post a minimal executable script so that we can reproduce your findings. Thanks!
I also can't understand this function calc_curr_k , have you solved it?
@liuweiyy nope. Still waiting for the author to address this issue.
Same quesition, looking for a mathematical proof or some papers about this.
I didn't understand you are waiting on me... Regarding the idea: mathematically, if you pass a delta kernel (1 in the center and zero otherwise) through a sequence of kernels (and flip the result) you get a single kernel equivalent to the sequence. That is the motivation for the function. If that is what you don't understand - point it out and I can elaborate. If you think I have a problem in the implementation - point it out and I can try to re-check. However it is very straight forward and I checked it many times before publication. In addition @jnoylinc points out he verified it is correct.
I didn't understand you are waiting on me... Regarding the idea: mathematically, if you pass a delta kernel (1 in the center and zero otherwise) through a sequence of kernels (and flip the result) you get a single kernel equivalent to the sequence. That is the motivation for the function. If that is what you don't understand - point it out and I can elaborate. If you think I have a problem in the implementation - point it out and I can try to re-check. However it is very straight forward and I checked it many times before publication. In addition @jnoylinc points out he verified it is correct.
thank you,
@sefibk Thanks for the reply.
It'd be great if you could elaborate on the motivation part. Pointing us to a full proof somewhere on this would also be much appreciated. Thanks in advance.
Regarding the implementation, I demonstrated with the script I posted above that the kernel computed with calc_curr_k does not agree with the actual kernel. And I think that serves as a disproof on the correctness of the implementation unless someone points out where I was wrong in that script.
@jnoylinc said he/she verified, and I asked above for a reproducible implementation, but he/she did not respond.
there is some bugs in your code,be careful, my code is deleted,I am soory
Regarding "theory" - it is nothing complicated. We are suppressing linear operations - we can definitely represent a sequence of convolutions as one. I don't have a proof for that obviously since it is very trivial. The use of a 'delta' was done just for simple implementation of the idea and having the kernel in our hands.
If I am not mistaken your code tests correctness on images organized as channels last. Try testing it with channels first (e.g. shape of (1, 3, 224, 224))
We can view conv op as matrix multiplication, then everything will be easy to understand.
@ZhuoranLyu - Thank you for the assistance - that is definitely true
Hi,
Thanks a lot for this great work! I have a quick question regarding the paper.
If I'm understanding it correctly, the idea is that the generator of KernelGAN can be always equated to a single kernel, which can be obtained via, e.g., KernelGAN.calc_curr_k. But do you mean that this equivalence is exact? In other words, the output of the generator is always exactly equal to convolving with this single kernel?
I tried to test this but from what I saw they do not seem to be the same. Can you please enlighten me on this? Many thanks in advance.