Closed mhusseinsh closed 6 years ago
Could you share with your training and test script? Could you also try to run your saved model on the training set images and see if it can match the saved results?
@junyanz This is how I run and test
python train.py --dataroot ./datasets/kitti--name kitti_cyclegan --model cycle_gan --gpu_ids=6 --display_id -1
python test.py --dataroot datasets/kitti/testA --name kitti_cyclegan --checkpoints_dir ./checkpoints/ --model test --gpu_ids=7 --resize_or_crop none
I tested on train images, and the same happens too
The images saved in the checkpoints during training are much much better, and this is so weird for me Have a look
I think for sure it has to be something wrong in testing, I really don't know what, but it is very weird. Even on training data, they look so bad when testing them, however, during training, they look really nice and as I want.
Interesting. Does it work without --resize_or_crop none
? @taesung89 @SsnL
how about running a test with cyclegan model directly?
python test.py --dataroot ./datasets/kitti--name kitti_cyclegan --model cycle_gan --gpu_ids=6
I think the problem might be that you scaled the image to 256px x 256px at training time, but at test time, you used the 800px x 600px original resolution. This is a pretty big gap.
I recommend you first test without --resize_or_crop none to see this is the real problem. Then I recommend training and test at the same scale. You can do this by
--resize_or_crop crop --fineSize 360
which loads the image at the original resolution of 800x600 and then making a square crop of 360x360. Please change the number 360 to something that fits on your GPU.
This method does not change the scale of the image, so you can use --resize_or_crop none
option at test time.
how about running a test with cyclegan model directly? @junyanz same results with cyclegan model directly
I recommend you first test without --resize_or_crop none to see this is the real problem. @taesung89 also same problem
@taesung89 maybe it is a little bit better
--resize_or_crop none
without --resize_or_crop none
But this is for an image in the test set, in general, it is not performing good on the test set compared to what was saved during training
There is always a training/test gap in any ML system. The key is to reproduce the training set results with the test script.
@junyanz Sorry I don't get your point. What do you mean by reproducing the training set results with the test script ?
You can run the model on the training images and see if it is the same as the saved results during training.
@junyanz yes this is what I did, as in my previous comments
The results look like the saved one without --resize_or_crop none
?
@junyanz which one do you mean ?
Sorry. I was traveling for the past two weeks. You can produce results with the test script on the training images, and see if the results are the same as your "saved" training image results.
There is often a gap between training and test due to overfitting. So it's quite common that the test images look worse compared to the training images. But to make sure that your test script is correct, you can do a sanity check using training images as described above.
@junyanz @taesung89 @SsnL I got the same problem when I trian pix2pix on cross-view image translation task. During training time, the results are quite good as I wanted, However, when I use the same images for testing and I got very bad results,
My commads are: python train.py --dataroot ./data --name setting_1 --model pix2pix --which_model_netG unet_256 --which_direction AtoB --dataset_mode aligned --norm batch --pool_size 50 --gpu_ids 0 --batch 32 --loadSize 286 --fineSize 256;
python test.py --dataroot ./data --name setting_1 --model pix2pix --which_model_netG unet_256 --which_direction AtoB --dataset_mode aligned --norm batch --gpu_ids 0 --batchSize 32 --loadSize 256 --fineSize 256;
Any suggestion.
@happsky maybe you also want to set the loadSize=286 during test time. I also think that there is severe overfitting during training. This is an ill-posed problem. But you training set results look identical to the ground truth images.
@junyanz I got the similar bad results after setting loadSize=286 during test time. For avoiding overfitting during training time, do you have any suggestions?
To avoid overfitting: (1) increase training set (2) dropout? (3) more data argumentation. We haven't used a big batchSize before. We use to use batchSize=1. You can try to make the model as eval mode. Call this function in the test code.
@happsky @mhusseinsh This should be fixed in https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/commit/7dfdd06d8f7ca41735c06ea67ffbebd222a4d65e ! Because the pix2pix model uses batch norm, if we don't set it to eval mode, the running stats are not used, and result will look quite bad because batch size is 1 in test time. Sorry about it. Could you try to pull the repo and test again? There is no need to re-train. Just running test.py again should be fine.
@SsnL Thank you so much and I got quite better results now!
@junyanz @SsnL As I posted before, Why the fake_B and the real_B flipped left and right?
There is random flipping in the current data augmentation. You can add no_flip
. See here for more details.
@junyanz Thank you for your quick response!
For the eval mode, I added a flag that allows you to use eval mode. In the original pix2pix paper (@phillipi ), we don't use eval mode during the test, as we often use batchSize=1, and we would like to get per-image statistics. We often get better results without eval mode. Here is a comparison of label-> facades with and without eval mode.
No eval mode
Eval mode
But in your case, you have a big batchSize during training (32) and you use a small batchSize(=1) during test (Note: we hard-coded the batchSize=1 in our test code. I will try to relax it later). In general, I will recommend that users use instance norm for both pix2pix and CycleGAN, which guarantees training/test behavior and also get per-image statistics.
For this research paper, unpaired image translation
Hello Sir, I am trying with dataset cezanne2photo (avaiable) with image resizing (128x128) from 256x256 (due to limit resource available). It seems that its discriminator gets overfitting. So questions based on the snapshot i have included. In generative model
In discriminative model,
I mean after I have done several hyperparameter and testing, its always overfitting.Its was so frustrating. So, kindly can you show the actual architecture with more specific parameters (not from code). I also follow CycleGAN present in tensorflow about data augmentation. Still resulting are not satisfying.
Thanks
You can change the network according to your application. Here is the model.
For overfitting, you can increase the --lambda_identity
. It partially alleviated the issue.
Hello,
I am training the CycleGAN on driving scenes, in the beginning the results were a little bit nicer but then there were a little bit weird
This is after epoch 18
This is after epoch 92
During the training, the results generated and saved in the checkpoints are much better and well translated. However, when i load the latest checkpoint on test images, these results up there I have @junyanz
any idea why this is happening ?