sefibk / KernelGAN

Other
337 stars 77 forks source link

The in_channels=1 for the G? #57

Open daifeng2016 opened 3 years ago

daifeng2016 commented 3 years ago

Hi, Thanks for sharing the code, which is very clear. However, I am confused about the in_channels of the Generator in the "network.py". If the input is RGB images, should the in_channels=3? ''' class Generator(nn.Module): def init(self, conf): super(Generator, self).init() struct = conf.G_structure

First layer - Converting RGB image to latent space

    self.first_layer = nn.Conv2d(in_channels=1, out_channels=conf.G_chan, kernel_size=struct[0], bias=False)

'''

sefibk commented 3 years ago

No. The input channel should always be 1, and when using RGB I transpose the dimension to be a batch of 3 images with 1 channel each. The reason for that is because I am looking for a kernel that will be applied to all the channels in an identical manner. If you choose in_channels=3 the generator will learn a different kernel for each channel. But when stacking as a batch, the same kernel is applied for each channel of the image If this explanation is not clear - let me know and I will try to rephrase.

daifeng2016 commented 3 years ago

Got it, thank you.

fourPieces0927 commented 3 years ago

No. The input channel should always be 1, and when using RGB I transpose the dimension to be a batch of 3 images with 1 channel each. The reason for that is because I am looking for a kernel that will be applied to all the channels in an identical manner. If you choose in_channels=3 the generator will learn a different kernel for each channel. But when stacking as a batch, the same kernel is applied for each channel of the image If this explanation is not clear - let me know and I will try to rephrase.

hi, if I use gray image to train the kernel,

No. The input channel should always be 1, and when using RGB I transpose the dimension to be a batch of 3 images with 1 channel each. The reason for that is because I am looking for a kernel that will be applied to all the channels in an identical manner. If you choose in_channels=3 the generator will learn a different kernel for each channel. But when stacking as a batch, the same kernel is applied for each channel of the image If this explanation is not clear - let me know and I will try to rephrase.

Do I need to modify the model if I use grayscale image for training(input images with 1 channel)? Or I just need run train.py directly?

sefibk commented 3 years ago

It has been a while since I ran this code. As I recall you don't need to modify the Generator. Maybe the Discriminator but try and see if it fails

fourPieces0927 commented 3 years ago

It has been a while since I ran this code. As I recall you don't need to modify the Generator. Maybe the Discriminator but try and see if it fails

thanks for reply. I think when the image is grayscale, I can directly use the model to train without modification, because the image has been changed to RGB when use the read_image() function in the "util.py". Is it right?

sefibk commented 3 years ago

You are probably right. the "quick & dirty" way that will definitely work is to duplicate the single channel you have to 3 identical channels and then everything should work. I think it should work with a single channel image but probably some utility functions will fail because I didn't build it generic enough.

fourPieces0927 commented 3 years ago

You are probably right. the "quick & dirty" way that will definitely work is to duplicate the single channel you have to 3 identical channels and then everything should work.

def get_top_left(self, size, for_g, idx): """Translate the center of the index of the crop to it's corresponding top-left""" center = self.crop_indices_for_g[idx] if for_g else self.crop_indices_for_d[idx] row, col = int(center / self.in_cols), center % self.in_cols top, left = min(max(0, row - size // 2), self.in_rows - size), min(max(0, col - size // 2), self.in_cols - size)

Choose even indices (to avoid misalignment with the loss map for_g)

    return top - top % 2, left - left % 2

Hi, I find that "center" represent the chosen pixel's index in the flatten vector, and (row, col) represent the coordinate in the input image which is corresponding to the "center" in the flatten vector. but I feel confused about the meaning of "top" and "left",can you tell me the meaning of them or how you get the 64×64 crop from (row, col)?

sefibk commented 3 years ago

this is a sketchy and not so elegant implementation, I agree! as far as I remember - self.crop_indices_for_g is a list of indices that determine the training crops. This function converts them to top, left, i.e. - the row and column from where the crop should start. Once the math is correct, you can take a crop of an image by img[top:top+64, left:left+64]. If you don't understand the min, max - let me know. It is pretty simple

fourPieces0927 commented 3 years ago

this is a sketchy and not so elegant implementation, I agree! as far as I remember - self.crop_indices_for_g is a list of indices that determine the training crops. This function converts them to top, left, i.e. - the row and column from where the crop should start. Once the math is correct, you can take a crop of an image by img[top:top+64, left:left+64]. If you don't understand the min, max - let me know. It is pretty simple

I got it, thank you very much!