mingyuliutw / UNIT

Unsupervised Image-to-Image Translation
Other
1.98k stars 360 forks source link

Hyperparameter information #51

Closed taki0112 closed 6 years ago

taki0112 commented 6 years ago

Hi ! I am reproducing your code with tensorflow. But I do not know the current hyperparameter information. (batch size, input size, dropout rate, etc.) Could you tell me which code I can check?

taki0112 commented 6 years ago

oh I found it ! (exps/*yaml) I have some questions

  1. scale means data augmentation ?
  2. I think GaussianVAE2D no use in Image translation (Appendix A)... where did you do ?
mingyuliutw commented 6 years ago

@taki0112

  1. scale means data augmentation ? No. It is the scale factor that we apply to all the input images. It is kept fixed through training. For example, if your images are all with size 1024x1024 and you set the scale to 0.5, then all the images will be scaled to have a size of 512x512 before feeding to the networks.

  2. I think GaussianVAE2D no use in Image translation (Appendix A)... where did you do ? You can use the layer. It will render similar results. For saving some memory space, I used a reduced implementation called GaussianLayer. It is like setting the learnable variance parameters in GaussianVAE2D to 1.

taki0112 commented 6 years ago

I can't see GaussianLayer... you mean GaussianNoiseLayer ?

see this code There is no GaussianLayer If you mean GaussianLayer == GaussianNoiseLayer ,, It is just sum a noise.. ! right ?

mingyuliutw commented 6 years ago

That’s correct.

taki0112 commented 6 years ago

um.. ! I have one more question .. ! sorry... !

In your paper, Appendix A.. you use 1x1 conv in Resblock ... (N512, K1, S1) but, In your code, you did not 1x1 conv you use 3x3 conv (stride=2) link

What do I have to use to get the results of your paper? 1x1 ? or 3x3 ?

mingyuliutw commented 6 years ago

Oops. This is a typo in the paper. Will fix it. The code is correct. It should be 3x3. K3, S1.

taki0112 commented 6 years ago

goooooooood!

Thank you

taki0112 commented 6 years ago

Hi

When I see this code

There is two Resblock

  1. COCOResGen (Relu based)
  2. COCOResGen2 (leaky Relu based)

What did you use?

mingyuliutw commented 6 years ago

I use COCOResGen2 but there is not much different in terms of results.

taki0112 commented 6 years ago

In COCOResGen, n_gen_front_blk-1

    for i in range(0, n_gen_front_blk-1):
      decA += [ReLUINSConvTranspose2d(tch, tch//2, kernel_size=3, stride=2, padding=1, output_padding=1)]
      decB += [ReLUINSConvTranspose2d(tch, tch//2, kernel_size=3, stride=2, padding=1, output_padding=1)]
      tch = tch//2

so It make 2 TransposeConv (and one Transconv with TanH)

but, all your *yaml.. n_gen_front_blk=3 but In your paper, The number of TransposeConv = 3 and TransposeConv with TanH = 1

It also typo ?

mingyuliutw commented 6 years ago

@taki0112 Thanks for tracing the code and you are right. I am modifying my answer here. There are 3 Transposed Convolutional layers, 2 are created in the for loop and 1 is created before the TanH. I count the one before TanH as a transposed convolutional layer even though the stride is 1. The structure should be DCONV-(N256,K3,S2), LeakyReLU DCONV-(N128,K3,S2), LeakyReLU DCONV-(N3,K1,S1),TanH

I also found that I have a typo for the encoders. The architecture should be CONV-(N64,K7,S1)

taki0112 commented 6 years ago
    # Convolutional back-end
    for i in range(0, n_gen_front_blk-1):
      decA += [LeakyReLUConvTranspose2d(tch, tch//2, kernel_size=3, stride=2, padding=1, output_padding=1)]
      decB += [LeakyReLUConvTranspose2d(tch, tch//2, kernel_size=3, stride=2, padding=1, output_padding=1)]
      tch = tch//2
    decA += [nn.ConvTranspose2d(tch, input_dim_a, kernel_size=1, stride=1, padding=0)]
    decB += [nn.ConvTranspose2d(tch, input_dim_b, kernel_size=1, stride=1, padding=0)]
    decA += [nn.Tanh()]
    decB += [nn.Tanh()]

In your code, LeakyReLUConvTranspose2d -> LeakyReLUConvTranspose2d -> LeakyReLUConvTranspose2d(Tanh)

However, In your paper DCONV-(N256,K3,S2), LeakyReLU DCONV-(N128,K3,S2), LeakyReLU DCONV-(N64,K3,S2), LeakyReLU DCONV-(N3,K1,S1),TanH

Different

And one more question Why did you do output_padding ? can you answer it ?

taki0112 commented 6 years ago

Discriminator n_layer = 6

  def _make_net(self, ch, input_dim, n_layer):
    model = []
    model += [LeakyReLUConv2d(input_dim, ch, kernel_size=3, stride=2, padding=1)] #16
    tch = ch
    for i in range(1, n_layer):
      model += [LeakyReLUConv2d(tch, tch * 2, kernel_size=3, stride=2, padding=1)] # 8
      tch *= 2
    model += [nn.Conv2d(tch, 1, kernel_size=1, stride=1, padding=0)]  # 1
    return nn.Sequential(*model)

In your code, LeakyReLUConv2d -> LeakyReLUConv2d* 5 -> Conv2d(kernel=1, stride=1)

However, In your paper LeakyReLUConv2d -> LeakyReLUConv2d* 4 -> Conv2d(kernel=2, stride=1)

Different

Is it also Typo ?

I'm sorry that there are too many questions. I am so interested in your paper. I would be grateful if you could understand my situation.

mingyuliutw commented 6 years ago

Why did you do output_padding ? can you answer it ?

For make sure the output image size is correct. I do not know if Tensorflow and PyTorch handle padding in the same way.

If there is a different between the code and paper, most likely the code should be correct.

mingyuliutw commented 6 years ago

@taki0112 For the discriminator, please use COCOMsDis. This multi-scale one works better for most of the cases.