liyunsheng13 / BDL

MIT License
222 stars 30 forks source link

Reproducing M0(1)[F(1)] #17

Closed yoavchai closed 4 years ago

yoavchai commented 5 years ago

Hi, Thanks for a great paper and code. I was trying to reproduce the result for M0(1)[F(1)] without SSL. As far as I understand it is required CycleGan learning and Segmentation learning (without SSL). I used the files you released for CycleGan and merge it to the updated code of CycleGan repository. I got 40.7 IOT while I should get 42.7. Any chance you can release also the files that you didn’t change for CycleGan? (including the command line) I used Patch size of 400 instead of 452 dues to GPU memory size. Do you think such variance make sense? The command line I used are: CycleGan - train: python train.py --dataroot datasets/gta/ --display_id -1 --init_weights deep_lab_checkpoint/cyclegan_sem_model.pth --niter 10 --niter_decay 10 --crop_size 400 --load_size 1024 --lambda_identity 0 CycleGan Test: python test.py --dataroot datasets/gta/ --name --load_size 1024 --preprocess scale_width --num_test 10000000 BDL: python BDL.py --snapshot-dir ./snapshots/gta2city --init-weights DeepLab/DeepLab_init.pth --num-steps-stop 80000 --model DeepLab --data-dir --data-list dataset/gta5_list/train.txt --data-list-target dataset/cityscapes_list/train.txt --data-dir-target

liyunsheng13 commented 5 years ago

I think the problem is the --lambda_identity should not be 0. It is a very important parameter, please refer to the paper of CycleGAN

yoavchai commented 5 years ago

Thanks for the help, i will try it. i am very fimiliar with CycleGAN, but i think you didn't mention it in the paper, so i assumed lambda_identity=0.

yoavchai commented 5 years ago

i added the identity loss and reduced the patch size to 368 to fit the gpu. i got even lower result - 40.26. Could you please release the full CycleGan code you used?

liyunsheng13 commented 5 years ago

I don't have time to clear all the codes for CycleGAN right now. Furthermore, I think it is not very necessary because I just use the official code of CycleGAN and make some changes by adding the perceptual loss. I think the problem you might have is the parameters you use to train CycleGAN are not the same as what I use. Could you post all the training parameters you use?

yoavchai commented 5 years ago

Attach below the train.txt created by CycleGan code. Do you see here any difference compare to yours (except Crop Size)? ----------------- Options --------------- batch_size: 1
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: False
crop_size: 368 [default: 256] dataroot: datasets/gta/ [default: None] dataset_mode: unaligned
direction: AtoB
display_env: main
display_freq: 400
display_id: -1 [default: 1] display_ncols: 4
display_port: 8097
display_server: http://localhost
display_winsize: 256
epoch: latest
epoch_count: 1
gan_mode: lsgan
gpu_ids: 4,5 [default: 0] init_gain: 0.02
init_type: normal
init_weights: deep_lab_checkpoint/cyclegan_sem_model.pth [default: semantic model initialization ] input_nc: 3
isTrain: True [default: None] lambda_A: 10.0
lambda_B: 10.0
lambda_identity: 0.5
lambda_semantic: 1
load_iter: 0 [default: 0] load_size: 1024 [default: 286] lr: 0.0002
lr_decay_iters: 50
lr_policy: linear
max_dataset_size: inf
model: cycle_gan
n_layers_D: 3
name: original_with_identity [default: experiment_name] ndf: 64
netD: basic
netG: resnet_9blocks
ngf: 64
niter: 10 [default: 100] niter_decay: 10 [default: 100] no_dropout: True
no_flip: False
no_html: False
norm: instance
num_threads: 4
output_nc: 3
phase: train
pool_size: 50
preprocess: resize_and_crop
print_freq: 100
save_by_iter: False
save_epoch_freq: 5
save_latest_freq: 5000
serial_batches: False
suffix:
update_html_freq: 1000
verbose: False
----------------- End -------------------

liyunsheng13 commented 5 years ago

I don't find any obvious problem. Maybe I use 4 or 8GPUs. The number of GPU can also influence the results. You can also first remove the percepture loss and improve the crop size of the images to 400 and see what performance you can get. Could you also share the logs to me? I want to know whether the loss changes correctly. Have you ever observed the transferred images?

yoavchai commented 5 years ago

hi, 1) attach link to the log: https://drive.google.com/file/d/1Q6voRvbIRelo3UtvR-55YaOEiNX2HFx5/view?usp=sharing but unlike the original repository, i print the average loss in the epoch instead of per batch. 2) i did observed the images, in general it looks "good", it make them look green like the real images.

liyunsheng13 commented 5 years ago

After check your log, I find the magnitude of the loss is not correct. All the losses are too small. I don't know the exact reason. I think the rec_sem_loss should be more than 10 in the beginning and decreases to around 1 after the training is complete. You can also find the D_A is also very small, actually I don't think the discriminator is well learned based on the loss you print.

yoavchai commented 4 years ago

i was able to run on a machine with 4 gpu and 16GB (so i used batch size 4, and the same crop size that you used), i was able to reproduce your result, i got 42.55 while you reported 42.7. i am closing the thread. i guess you should write that you used 4 gpus for the training in the and using identity loss.

Thanks for the help!

liyunsheng13 commented 4 years ago

Thanks for your suggesting. I will add the configuration to my new Arxiv version.

jj0mst commented 4 years ago

@liyunsheng13 could you please confirm what batch size you used to get the final results? It is a very import hyperparameter for semantic segmentation (especially in the deeplab architecture).

I think it is hard to get the reported results if the whole pipeline is trained with bsize=1 (which is the default option). I've always read that the stats computed by batchnorm with small bsize are inaccurate.

liyunsheng13 commented 4 years ago

The batch size is 1 for all segmentation models.

jj0mst commented 4 years ago

Thank you for your answer, but what did you mean here then?

I don't find any obvious problem. Maybe I use 4 or 8GPUs. The number of GPU can also influence the results.

If you use batch size 1, then I think you only need 1 GPU. Or do you mean that 1 is the batch size per GPU?

liyunsheng13 commented 4 years ago

We were talking about CycleGAN model not the segmentation model