weird results - Githubissues

mhusseinsh commented 6 years ago

Hello,

I am training the CycleGAN on driving scenes, in the beginning the results were a little bit nicer but then there were a little bit weird

This is after epoch 18

This is after epoch 92

During the training, the results generated and saved in the checkpoints are much better and well translated. However, when i load the latest checkpoint on test images, these results up there I have @junyanz

any idea why this is happening ?

junyanz commented 6 years ago

Could you share with your training and test script? Could you also try to run your saved model on the training set images and see if it can match the saved results?

mhusseinsh commented 6 years ago

@junyanz This is how I run and test python train.py --dataroot ./datasets/kitti--name kitti_cyclegan --model cycle_gan --gpu_ids=6 --display_id -1

python test.py --dataroot datasets/kitti/testA --name kitti_cyclegan --checkpoints_dir ./checkpoints/ --model test --gpu_ids=7 --resize_or_crop none

mhusseinsh commented 6 years ago

I tested on train images, and the same happens too

mhusseinsh commented 6 years ago

The images saved in the checkpoints during training are much much better, and this is so weird for me Have a look

I think for sure it has to be something wrong in testing, I really don't know what, but it is very weird. Even on training data, they look so bad when testing them, however, during training, they look really nice and as I want.

junyanz commented 6 years ago

Interesting. Does it work without --resize_or_crop none? @taesung89 @SsnL

how about running a test with cyclegan model directly?

python test.py --dataroot ./datasets/kitti--name kitti_cyclegan --model cycle_gan --gpu_ids=6

taesungp commented 6 years ago

I think the problem might be that you scaled the image to 256px x 256px at training time, but at test time, you used the 800px x 600px original resolution. This is a pretty big gap.

I recommend you first test without --resize_or_crop none to see this is the real problem. Then I recommend training and test at the same scale. You can do this by

--resize_or_crop crop --fineSize 360

which loads the image at the original resolution of 800x600 and then making a square crop of 360x360. Please change the number 360 to something that fits on your GPU.

This method does not change the scale of the image, so you can use --resize_or_crop none option at test time.

mhusseinsh commented 6 years ago

how about running a test with cyclegan model directly? @junyanz same results with cyclegan model directly

I recommend you first test without --resize_or_crop none to see this is the real problem. @taesung89 also same problem

mhusseinsh commented 6 years ago

@taesung89 maybe it is a little bit better --resize_or_crop none

without --resize_or_crop none

But this is for an image in the test set, in general, it is not performing good on the test set compared to what was saved during training

junyanz commented 6 years ago

There is always a training/test gap in any ML system. The key is to reproduce the training set results with the test script.

mhusseinsh commented 6 years ago

@junyanz Sorry I don't get your point. What do you mean by reproducing the training set results with the test script ?

junyanz commented 6 years ago

You can run the model on the training images and see if it is the same as the saved results during training.

mhusseinsh commented 6 years ago

@junyanz yes this is what I did, as in my previous comments

junyanz commented 6 years ago

The results look like the saved one without --resize_or_crop none？

mhusseinsh commented 6 years ago

@junyanz which one do you mean ?

junyanz commented 6 years ago

Sorry. I was traveling for the past two weeks. You can produce results with the test script on the training images, and see if the results are the same as your "saved" training image results.

There is often a gap between training and test due to overfitting. So it's quite common that the test images look worse compared to the training images. But to make sure that your test script is correct, you can do a sanity check using training images as described above.

happsky commented 6 years ago

@junyanz @taesung89 @SsnL I got the same problem when I trian pix2pix on cross-view image translation task. During training time, the results are quite good as I wanted, However, when I use the same images for testing and I got very bad results,

My commads are: python train.py --dataroot ./data --name setting_1 --model pix2pix --which_model_netG unet_256 --which_direction AtoB --dataset_mode aligned --norm batch --pool_size 50 --gpu_ids 0 --batch 32 --loadSize 286 --fineSize 256;

python test.py --dataroot ./data --name setting_1 --model pix2pix --which_model_netG unet_256 --which_direction AtoB --dataset_mode aligned --norm batch --gpu_ids 0 --batchSize 32 --loadSize 256 --fineSize 256;

Any suggestion.

junyanz commented 6 years ago

@happsky maybe you also want to set the loadSize=286 during test time. I also think that there is severe overfitting during training. This is an ill-posed problem. But you training set results look identical to the ground truth images.

happsky commented 6 years ago

@junyanz I got the similar bad results after setting loadSize=286 during test time. For avoiding overfitting during training time, do you have any suggestions?

junyanz commented 6 years ago

To avoid overfitting: (1) increase training set (2) dropout? (3) more data argumentation. We haven't used a big batchSize before. We use to use batchSize=1. You can try to make the model as eval mode. Call this function in the test code.

ssnl commented 6 years ago

@happsky @mhusseinsh This should be fixed in https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/commit/7dfdd06d8f7ca41735c06ea67ffbebd222a4d65e ! Because the pix2pix model uses batch norm, if we don't set it to eval mode, the running stats are not used, and result will look quite bad because batch size is 1 in test time. Sorry about it. Could you try to pull the repo and test again? There is no need to re-train. Just running test.py again should be fine.

happsky commented 6 years ago

@SsnL Thank you so much and I got quite better results now!

happsky commented 6 years ago

@junyanz @SsnL As I posted before, Why the fake_B and the real_B flipped left and right?

junyanz commented 6 years ago

There is random flipping in the current data augmentation. You can add no_flip. See here for more details.

happsky commented 6 years ago

@junyanz Thank you for your quick response!

junyanz commented 6 years ago

For the eval mode, I added a flag that allows you to use eval mode. In the original pix2pix paper (@phillipi ), we don't use eval mode during the test, as we often use batchSize=1, and we would like to get per-image statistics. We often get better results without eval mode. Here is a comparison of label-> facades with and without eval mode.

No eval mode no_eval

Eval mode eval

But in your case, you have a big batchSize during training (32) and you use a small batchSize(=1) during test (Note: we hard-coded the batchSize=1 in our test code. I will try to relax it later). In general, I will recommend that users use instance norm for both pix2pix and CycleGAN, which guarantees training/test behavior and also get per-image statistics.

sahilsharma884 commented 4 years ago

For this research paper, unpaired image translation

Capture

Hello Sir, I am trying with dataset cezanne2photo (avaiable) with image resizing (128x128) from 256x256 (due to limit resource available). It seems that its discriminator gets overfitting. So questions based on the snapshot i have included. In generative model

Can you specify in which layer should I used dropout layer with dropout rate value?
Did you apply reflect padding with in the all 6 resnet blocks only or in a whole network (rather than zero padding)?
at the last layer, c7s1-3, did you apply padding or any activation (in actual it is ReLU)? if yes please specify? If not, then the image has to be normalized rather than standardization as ReLU activation will neglected negative values when perform standardization.

In discriminative model,

what was the kernel size, padding, stride and activation in the last layer after c512?

I mean after I have done several hyperparameter and testing, its always overfitting.Its was so frustrating. So, kindly can you show the actual architecture with more specific parameters (not from code). I also follow CycleGAN present in tensorflow about data augmentation. Still resulting are not satisfying.

Thanks

junyanz commented 4 years ago

You can change the network according to your application. Here is the model.

No. The current dropout flag is applied to many layers.
The entire network. I recommend reflection padding.
The last year has a reflection padding layer according to this line.

For overfitting, you can increase the --lambda_identity. It partially alleviated the issue.

junyanz / pytorch-CycleGAN-and-pix2pix

weird results #336