Open daifeng2016 opened 3 years ago
No. The input channel should always be 1, and when using RGB I transpose the dimension to be a batch of 3 images with 1 channel each. The reason for that is because I am looking for a kernel that will be applied to all the channels in an identical manner. If you choose in_channels=3 the generator will learn a different kernel for each channel. But when stacking as a batch, the same kernel is applied for each channel of the image If this explanation is not clear - let me know and I will try to rephrase.
Got it, thank you.
No. The input channel should always be 1, and when using RGB I transpose the dimension to be a batch of 3 images with 1 channel each. The reason for that is because I am looking for a kernel that will be applied to all the channels in an identical manner. If you choose in_channels=3 the generator will learn a different kernel for each channel. But when stacking as a batch, the same kernel is applied for each channel of the image If this explanation is not clear - let me know and I will try to rephrase.
hi, if I use gray image to train the kernel,
No. The input channel should always be 1, and when using RGB I transpose the dimension to be a batch of 3 images with 1 channel each. The reason for that is because I am looking for a kernel that will be applied to all the channels in an identical manner. If you choose in_channels=3 the generator will learn a different kernel for each channel. But when stacking as a batch, the same kernel is applied for each channel of the image If this explanation is not clear - let me know and I will try to rephrase.
Do I need to modify the model if I use grayscale image for training(input images with 1 channel)? Or I just need run train.py directly?
It has been a while since I ran this code. As I recall you don't need to modify the Generator. Maybe the Discriminator but try and see if it fails
It has been a while since I ran this code. As I recall you don't need to modify the Generator. Maybe the Discriminator but try and see if it fails
thanks for reply. I think when the image is grayscale, I can directly use the model to train without modification, because the image has been changed to RGB when use the read_image() function in the "util.py". Is it right?
You are probably right. the "quick & dirty" way that will definitely work is to duplicate the single channel you have to 3 identical channels and then everything should work. I think it should work with a single channel image but probably some utility functions will fail because I didn't build it generic enough.
You are probably right. the "quick & dirty" way that will definitely work is to duplicate the single channel you have to 3 identical channels and then everything should work.
def get_top_left(self, size, for_g, idx): """Translate the center of the index of the crop to it's corresponding top-left""" center = self.crop_indices_for_g[idx] if for_g else self.crop_indices_for_d[idx] row, col = int(center / self.in_cols), center % self.in_cols top, left = min(max(0, row - size // 2), self.in_rows - size), min(max(0, col - size // 2), self.in_cols - size)
return top - top % 2, left - left % 2
Hi, I find that "center" represent the chosen pixel's index in the flatten vector, and (row, col) represent the coordinate in the input image which is corresponding to the "center" in the flatten vector. but I feel confused about the meaning of "top" and "left",can you tell me the meaning of them or how you get the 64×64 crop from (row, col)?
this is a sketchy and not so elegant implementation, I agree!
as far as I remember - self.crop_indices_for_g
is a list of indices that determine the training crops. This function converts them to top, left, i.e. - the row and column from where the crop should start. Once the math is correct, you can take a crop of an image by img[top:top+64, left:left+64]
.
If you don't understand the min, max - let me know. It is pretty simple
this is a sketchy and not so elegant implementation, I agree! as far as I remember -
self.crop_indices_for_g
is a list of indices that determine the training crops. This function converts them to top, left, i.e. - the row and column from where the crop should start. Once the math is correct, you can take a crop of an image byimg[top:top+64, left:left+64]
. If you don't understand the min, max - let me know. It is pretty simple
I got it, thank you very much!
Hi, Thanks for sharing the code, which is very clear. However, I am confused about the in_channels of the Generator in the "network.py". If the input is RGB images, should the in_channels=3? ''' class Generator(nn.Module): def init(self, conf): super(Generator, self).init() struct = conf.G_structure
First layer - Converting RGB image to latent space
'''