Optimal generator selected using validation data performs worse on test data

Yash-10 commented 1 year ago

Hello, thanks for making the code public! I have been using the pix2pix code for a custom application where the difference between input and output domain images is visually insignificant. Still, there is some difference between them which I would like to map.

I split the dataset into ~3600 for training, ~460 for validation, and ~460 for testing. I saved the generator after every ten epochs and used the validation dataset to select the best generator based on some statistical metrics. After selecting the best model on the validation set (which gives excellent performance on the validation data), I tested it on separate test data. However, the results using the optimal generator model are worse on the test data. Are there any intuition/suggestions for why this must be happening and any possible solutions?

(Also, I observed a large up-down nature in terms of performance on validation data when validating generators saved at different epochs).

junyanz commented 1 year ago

Not sure. "Worse" means that your results on test data are worse than results on training data or validation data?

One suggestion is to use the same augmentation/preprocessing steps across training, validation, and test.

Yash-10 commented 1 year ago

Hello @junyanz, thank you so much for your reply!

Sorry if it was not clear before. I meant the results on the test data are worse than the results on the validation data and not the training data. It does seem that images from the test set are not very different from those in the validation set. Hence, the performance difference between validation and test sets is confusing.

Thank you for the suggestion about using the same augmentations throughout. I will try it soon!

junyanz commented 1 year ago

One sanity check is that you manually copy and paste the images from the test data to the validation data and see if there is a difference.

HassanBinHaroon commented 1 year ago

@Yash-10 Kindly swap the validation and test folder. Otherwise use random splitting after merging train, val and test.

junyanz / pytorch-CycleGAN-and-pix2pix

Optimal generator selected using validation data performs worse on test data #1561