Unable to reproduce your gta2city results using parallel GPUs

Hi,

Thank you very much for open sourcing your gta2city pre-trained model. Currently, I am working on translating images from the GTA domain to the Cityscapes one. My first goal was to reproduce your results, by training UNIT from scratch, but due to time constraints I implemented a version of UNIT which works with parallel GPUs. Unfortunately, I am not seeing the same visually appeasing results that you got.

I edited and used the same config file that I suppose you used for training your gta2city model - /UNIT/blob/master/configs/unit_gta2city_list.yaml. I have attached the exact config file that I used and also some sample images comparing the results of your model and mine.

# Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).

# logger options
image_save_iter: 10000        # How often do you want to save output images during training
image_display_iter: 100       # How often do you want to display output images during training
display_size: 4               # How many images do you want to display each time
snapshot_save_iter: 10000     # How often do you want to save trained models
log_iter: 1                   # How often do you want to log the training stats

# optimization options
max_iter: 130000             # maximum number of training iterations
batch_size: 6                 # batch size
weight_decay: 0.0001          # weight decay
beta1: 0.5                    # Adam parameter
beta2: 0.999                  # Adam parameter
init: kaiming                 # initialization [gaussian/kaiming/xavier/orthogonal]
lr: 0.0001                    # initial learning rate
lr_policy: step               # learning rate scheduler
step_size: 100000             # how often to decay learning rate
gamma: 0.5                    # how much to decay learning rate
gan_w: 1                      # weight of adversarial loss
recon_x_w: 10                 # weight of image reconstruction loss
recon_h_w: 0                  # weight of hidden reconstruction loss
recon_kl_w: 0.01              # weight of KL loss for reconstruction
recon_x_cyc_w: 10             # weight of cycle consistency loss
recon_kl_cyc_w: 0.01          # weight of KL loss for cycle consistency
vgg_w: 1                      # weight of domain-invariant perceptual loss

# model options
gen:
  dim: 64                     # number of filters in the bottommost layer
  activ: relu                 # activation function [relu/lrelu/prelu/selu/tanh]
  n_downsample: 2             # number of downsampling layers in content encoder
  n_res: 4                    # number of residual blocks in content encoder/decoder
  pad_type: reflect           # padding type [zero/reflect]
dis:
  dim: 64                    # number of filters in the bottommost layer
  norm: none                  # normalization layer [none/bn/in/ln]
  activ: lrelu                # activation function [relu/lrelu/prelu/selu/tanh]
  n_layer: 4                  # number of layers in D
  gan_type: lsgan             # GAN loss [lsgan/nsgan]
  num_scales: 3               # number of scales
  pad_type: reflect           # padding type [zero/reflect]

# data options
input_dim_a: 3                              # number of image channels [1/3]
input_dim_b: 3                              # number of image channels [1/3]
num_workers: 8                              # number of data loading threads
new_size_a: 256                             # first resize the shortest image side to this size
new_size_b: 256                             # first resize the shortest image side to this size
crop_image_height: 256                      # random crop image of this height
crop_image_width: 512                       # random crop image of this width

data_folder_train_a: /home/ubuntu/gta/images
data_list_train_a: /home/ubuntu/image_list/gta_train.txt
data_folder_test_a: /home/ubuntu/gta/images
data_list_test_a: /home/ubuntu/image_list/gta_valid.txt
data_folder_train_b: /home/ubuntu/Cityscapes/leftImg8bit/train
data_list_train_b: /home/ubuntu/image_list/cityscapes_train.txt
data_folder_test_b: /home/ubuntu/Cityscapes/leftImg8bit/val
data_list_test_b: /home/ubuntu/image_list/cityscapes_valid.txt

Results (after converting gta images to cityscapes domain)

Your model	My model

As you can see my images don't seem to be semantically consistent (trees start to manifest here and there). Do you have any ideas why this would be happening?

It would be really helpful if you could let me know of the exact settings that you used like -

For how many iterations did you train your gta2city model? If I understand correctly, I am effectively training for batch_size max_iter = 6 130000 = 780000 iterations.
What were the dataset sizes that you worked with? Did you use all of gta and cityscapes dataset i.e. 25k images each, or a part of them?
What were the image size parameters in your model (like the new_size, crop_size etc.). As you can see from my config, I had to reduce the crop_image_height to 256 due to memory issues.

And I noticed that since you worked with batch size 1, there is no need of batch normalization, but I increased the batch size to 6 to work with 6 GPUs, so do I need to add it, will it affect the final results?

Thank you so much for your time :)

mingyuliutw / UNIT

Unable to reproduce your gta2city results using parallel GPUs #103