Equivalence between generator and extracted kernel

michaelshiyu commented 4 years ago

Hi,

Thanks a lot for this great work! I have a quick question regarding the paper.

If I'm understanding it correctly, the idea is that the generator of KernelGAN can be always equated to a single kernel, which can be obtained via, e.g., KernelGAN.calc_curr_k. But do you mean that this equivalence is exact? In other words, the output of the generator is always exactly equal to convolving with this single kernel?

I tried to test this but from what I saw they do not seem to be the same. Can you please enlighten me on this? Many thanks in advance.

sefibk commented 4 years ago

Yes, G is always exactly equivalent to a convolution with a single kernel. Since there are no non-linear activations, the sequence of convolutions (which are LINEAR!!!) can be replaced with a single kernel - similar to x 2 3 6 = x 36 To verify, simply initialize G with random weight, pass an image through the network and compare it to OUR resize function (or Matlab's) with the computed kernel. You should get the exact same result

michaelshiyu commented 4 years ago

Thanks a lot for answering!

I agree that a sequence of linear convolution layers can be collapsed into a single one, but is there a simple way to see that this filter is exactly the one we obtain by running calc_curr_k?

Also, I tried testing this numerically. Here's a minimal script replicating what I did:

import os
import numpy as np
from easydict import EasyDict as edict
import torch

os.chdir('/path/to/KernelGAN/')
import loss
import networks
import torch.nn.functional as F
from util import save_final_kernel, run_zssr, post_process_k, kernel_shift
from imresize import imresize

# set config
d = edict()
d.input_crop_size = 64
d.scale_factor = .5
d.G_chan = 64
d.G_kernel_size = 13
d.D_chan = 64
d.D_n_layers = 7
d.D_kernel_size = 7
d.G_structure = [7, 5, 3, 1, 1, 1]

# re-define a simpler version of KernelGAN class
class KernelGAN:
    def __init__(self, conf):
        # Acquire configuration
        self.conf = conf

        # Define the GAN
        self.G = networks.Generator(conf)
        # self.D = networks.Discriminator(conf)

        # The kernel G is imitating
        self.curr_k = torch.FloatTensor(conf.G_kernel_size, conf.G_kernel_size)

    def calc_curr_k(self):
        """given a generator network, the function calculates the kernel it is imitating"""
        delta = torch.Tensor([1.]).unsqueeze(0).unsqueeze(-1).unsqueeze(-1)
        for ind, w in enumerate(self.G.parameters()):
            curr_k = F.conv2d(delta, w, padding=self.conf.G_kernel_size - 1) if ind == 0 else F.conv2d(curr_k, w)
        self.curr_k = curr_k.squeeze().flip([0, 1])

# init network, get kernel
net = KernelGAN(d)
net.calc_curr_k()

# get a random image patch
img = np.random.rand(64, 64, 3).astype(np.float32)

# get rescaled images w/ downscaling factor 2
# k = kernel_shift(net.curr_k.detach().float().numpy(), 2)
k = net.curr_k.detach().float().numpy()
net_out = net.G(torch.from_numpy(img).view(1, 3, 64, 64)).detach()
k_out = imresize(img, .5, kernel=k).transpose((2, 0, 1))

# compare shapes
print(net_out.size())  # torch.Size([1, 3, 26, 26])
print(k_out.shape)  # (3, 32, 32)

# compare the middle two rows of the output images
print(net_out[0][0][12:14])
print(k_out[0][15:17])

And the outputs are

tensor([[-0.0034, -0.0152, -0.0061, -0.0219, -0.0208, -0.0114, -0.0156, -0.0264,
         -0.0260, -0.0195, -0.0092, -0.0163, -0.0102, -0.0141, -0.0291, -0.0120,
         -0.0058, -0.0234, -0.0333, -0.0195, -0.0151, -0.0302, -0.0316, -0.0204,
         -0.0284,  0.0142],
        [-0.0317, -0.0024, -0.0353, -0.0156, -0.0192, -0.0156, -0.0207, -0.0327,
         -0.0191, -0.0184, -0.0149, -0.0182, -0.0242, -0.0179, -0.0219, -0.0193,
         -0.0055, -0.0402, -0.0166, -0.0137, -0.0180, -0.0203,  0.0003,  0.0034,
         -0.0219, -0.0291]])

[[-0.01040298 -0.02667889 -0.01370732 -0.0288636  -0.02506967 -0.01364688
  -0.00924972 -0.00425742 -0.01827828 -0.00914193 -0.02248633 -0.00315671
  -0.00693536 -0.01682552 -0.02707781 -0.02352768 -0.0201562  -0.01160016
  -0.01983571 -0.01985664 -0.022903   -0.03139445 -0.0211829  -0.00792672
  -0.00433569 -0.00781635 -0.01967012 -0.01601905 -0.01832438 -0.02570574
  -0.02028925 -0.0125796 ]
 [-0.01464794 -0.01586494 -0.02722405 -0.01068424 -0.02654948 -0.01590517
  -0.04187544  0.00326075 -0.00151808 -0.02657411 -0.02184494 -0.01424225
  -0.00618783 -0.02599089 -0.02297904 -0.01945422 -0.03042439 -0.01876058
  -0.02300953 -0.01963425 -0.00840907 -0.00931074 -0.01729889 -0.00336883
  -0.01440977 -0.02402707 -0.00701743  0.00232169 -0.01451722 -0.02746969
  -0.01549856 -0.02331167]]

As you can see, they don't seem to agree. And they do not agree with each other if I used the shifted kernel obtained via kernel_shift either. Surely, I've made mistake(s) somewhere in this script? Could you kindly point that out please?

Thanks again and I appreciate your help!

jnoylinc commented 4 years ago

Thanks a lot for answering!

I agree that a sequence of linear convolution layers can be collapsed into a single one, but is there a simple way to see that this filter is exactly the one we obtain by running calc_curr_k?

Also, I tried testing this numerically. Here's a minimal script replicating what I did:

import os
import numpy as np
from easydict import EasyDict as edict
import torch

os.chdir('/path/to/KernelGAN/')
import loss
import networks
import torch.nn.functional as F
from util import save_final_kernel, run_zssr, post_process_k, kernel_shift
from imresize import imresize

# set config
d = edict()
d.input_crop_size = 64
d.scale_factor = .5
d.G_chan = 64
d.G_kernel_size = 13
d.D_chan = 64
d.D_n_layers = 7
d.D_kernel_size = 7
d.G_structure = [7, 5, 3, 1, 1, 1]

# re-define a simpler version of KernelGAN class
class KernelGAN:
    def __init__(self, conf):
        # Acquire configuration
        self.conf = conf

        # Define the GAN
        self.G = networks.Generator(conf)
        # self.D = networks.Discriminator(conf)

        # The kernel G is imitating
        self.curr_k = torch.FloatTensor(conf.G_kernel_size, conf.G_kernel_size)

    def calc_curr_k(self):
        """given a generator network, the function calculates the kernel it is imitating"""
        delta = torch.Tensor([1.]).unsqueeze(0).unsqueeze(-1).unsqueeze(-1)
        for ind, w in enumerate(self.G.parameters()):
            curr_k = F.conv2d(delta, w, padding=self.conf.G_kernel_size - 1) if ind == 0 else F.conv2d(curr_k, w)
        self.curr_k = curr_k.squeeze().flip([0, 1])

# init network, get kernel
net = KernelGAN(d)
net.calc_curr_k()

# get a random image patch
img = np.random.rand(64, 64, 3).astype(np.float32)

# get rescaled images w/ downscaling factor 2
# k = kernel_shift(net.curr_k.detach().float().numpy(), 2)
k = net.curr_k.detach().float().numpy()
net_out = net.G(torch.from_numpy(img).view(1, 3, 64, 64)).detach()
k_out = imresize(img, .5, kernel=k).transpose((2, 0, 1))

# compare shapes
print(net_out.size())  # torch.Size([1, 3, 26, 26])
print(k_out.shape)  # (3, 32, 32)

# compare the middle two rows of the output images
print(net_out[0][0][12:14])
print(k_out[0][15:17])

And the outputs are

tensor([[-0.0034, -0.0152, -0.0061, -0.0219, -0.0208, -0.0114, -0.0156, -0.0264,
         -0.0260, -0.0195, -0.0092, -0.0163, -0.0102, -0.0141, -0.0291, -0.0120,
         -0.0058, -0.0234, -0.0333, -0.0195, -0.0151, -0.0302, -0.0316, -0.0204,
         -0.0284,  0.0142],
        [-0.0317, -0.0024, -0.0353, -0.0156, -0.0192, -0.0156, -0.0207, -0.0327,
         -0.0191, -0.0184, -0.0149, -0.0182, -0.0242, -0.0179, -0.0219, -0.0193,
         -0.0055, -0.0402, -0.0166, -0.0137, -0.0180, -0.0203,  0.0003,  0.0034,
         -0.0219, -0.0291]])

[[-0.01040298 -0.02667889 -0.01370732 -0.0288636  -0.02506967 -0.01364688
  -0.00924972 -0.00425742 -0.01827828 -0.00914193 -0.02248633 -0.00315671
  -0.00693536 -0.01682552 -0.02707781 -0.02352768 -0.0201562  -0.01160016
  -0.01983571 -0.01985664 -0.022903   -0.03139445 -0.0211829  -0.00792672
  -0.00433569 -0.00781635 -0.01967012 -0.01601905 -0.01832438 -0.02570574
  -0.02028925 -0.0125796 ]
 [-0.01464794 -0.01586494 -0.02722405 -0.01068424 -0.02654948 -0.01590517
  -0.04187544  0.00326075 -0.00151808 -0.02657411 -0.02184494 -0.01424225
  -0.00618783 -0.02599089 -0.02297904 -0.01945422 -0.03042439 -0.01876058
  -0.02300953 -0.01963425 -0.00840907 -0.00931074 -0.01729889 -0.00336883
  -0.01440977 -0.02402707 -0.00701743  0.00232169 -0.01451722 -0.02746969
  -0.01549856 -0.02331167]]

As you can see, they don't seem to agree. And they do not agree with each other if I used the shifted kernel obtained via kernel_shift either. Surely, I've made mistake(s) somewhere in this script? Could you kindly point that out please?

Thanks again and I appreciate your help!

post_process_k function

michaelshiyu commented 4 years ago

@jnoylinc post_process_k does two things. It first zeros out negligible values in the extracted kernel, which, if anything, only results in the kernel further deviating from the generator. Second, it calls the kernel_shift function on the extracted kernel, which I tried (commented out in the above snippet) but still did not make the kernel produce identical output as the network.

1214635079 commented 3 years ago

@jnoylinc post_process_k does two things. It first zeros out negligible values in the extracted kernel, which, if anything, only results in the kernel further deviating from the generator. Second, it calls the kernel_shift function on the extracted kernel, which I tried (commented out in the above snippet) but still did not make the kernel produce identical output as the network.

Yes, I also met this question, have you solved it?

jnoylinc commented 3 years ago

@jnoylinc post_process_k does two things. It first zeros out negligible values in the extracted kernel, which, if anything, only results in the kernel further deviating from the generator. Second, it calls the kernel_shift function on the extracted kernel, which I tried (commented out in the above snippet) but still did not make the kernel produce identical output as the network.

Yes, I also met this question, have you solved it?

No，it is not important for me

michaelshiyu commented 3 years ago

Hi @1214635079, I haven't. I might be wrong here but I do not think the extracted kernel is the same as the learned generator.

Hi @sefibk, have you got a chance to look into this? Thanks!

sefibk commented 3 years ago

I am sorry but I don't have the time to thoroughly check it. Your script seems right. I verified correctness when the model was developed, but since it was a while ago, I don't recall if there was something special about it.

jnoylinc commented 3 years ago

Hi @1214635079, I haven't. I might be wrong here but I do not think the extracted kernel is the same as the learned generator.

Hi @sefibk, have you got a chance to look into this? Thanks!

I have check the code, the auther is right

sefibk commented 3 years ago

Great news @jnoylinc. thanks!

1214635079 commented 3 years ago

Hi @1214635079, I haven't. I might be wrong here but I do not think the extracted kernel is the same as the learned generator. Hi @sefibk, have you got a chance to look into this? Thanks!

I have check the code, the auther is right

Nice! I believe G is always equivalent to a convolution with a single kernel since there are all linear layers. But can you tell me what the problem is in @michaelshiyu 's codes? Or can you attach your code here? Thanks!

michaelshiyu commented 3 years ago

@1214635079 I agree that a sequence of conv layers can be collapsed into a single conv layer if there are no nonlinearities involved. My concern is that this single conv kernel is not the same as the one computed using the given code, as demonstrated in my script above.

@jnoylinc Please post a minimal executable script so that we can reproduce your findings. Thanks!

liuweiyy commented 3 years ago

I also can't understand this function calc_curr_k , have you solved it?

michaelshiyu commented 3 years ago

@liuweiyy nope. Still waiting for the author to address this issue.

ZhuoranLyu commented 3 years ago

Same quesition, looking for a mathematical proof or some papers about this.

sefibk commented 3 years ago

I didn't understand you are waiting on me... Regarding the idea: mathematically, if you pass a delta kernel (1 in the center and zero otherwise) through a sequence of kernels (and flip the result) you get a single kernel equivalent to the sequence. That is the motivation for the function. If that is what you don't understand - point it out and I can elaborate. If you think I have a problem in the implementation - point it out and I can try to re-check. However it is very straight forward and I checked it many times before publication. In addition @jnoylinc points out he verified it is correct.

liuweiyy commented 3 years ago

I didn't understand you are waiting on me... Regarding the idea: mathematically, if you pass a delta kernel (1 in the center and zero otherwise) through a sequence of kernels (and flip the result) you get a single kernel equivalent to the sequence. That is the motivation for the function. If that is what you don't understand - point it out and I can elaborate. If you think I have a problem in the implementation - point it out and I can try to re-check. However it is very straight forward and I checked it many times before publication. In addition @jnoylinc points out he verified it is correct.

thank you,

michaelshiyu commented 3 years ago

@sefibk Thanks for the reply.

It'd be great if you could elaborate on the motivation part. Pointing us to a full proof somewhere on this would also be much appreciated. Thanks in advance.

Regarding the implementation, I demonstrated with the script I posted above that the kernel computed with calc_curr_k does not agree with the actual kernel. And I think that serves as a disproof on the correctness of the implementation unless someone points out where I was wrong in that script.

@jnoylinc said he/she verified, and I asked above for a reproducible implementation, but he/she did not respond.

jnoylinc commented 3 years ago

there is some bugs in your code,be careful, my code is deleted,I am soory

sefibk commented 3 years ago

Regarding "theory" - it is nothing complicated. We are suppressing linear operations - we can definitely represent a sequence of convolutions as one. I don't have a proof for that obviously since it is very trivial. The use of a 'delta' was done just for simple implementation of the idea and having the kernel in our hands.

If I am not mistaken your code tests correctness on images organized as channels last. Try testing it with channels first (e.g. shape of (1, 3, 224, 224))

ZhuoranLyu commented 3 years ago

We can view conv op as matrix multiplication, then everything will be easy to understand.

sefibk commented 3 years ago

@ZhuoranLyu - Thank you for the assistance - that is definitely true

sefibk / KernelGAN

Equivalence between generator and extracted kernel #31