About the validation result when I train just 1 epoch

matsuneA212 commented 6 years ago

Hello. I have a question.

I want to train the network by using SR291 dataset. I tried to change the parameters and code. However, when I train with 1 epoch, the validation result is too good (e.g. 37.0581). It seems to be used some other model which I didn't train. So, when I want to make new model with other dataset, where should I change?

yulunzhang commented 6 years ago

Hi,

Your validation result should be reasonable, if you didn't use pretrained model. As very deep/wide networks have much better representational ability, they can achieve high validation results with several training epochs. However, when the training set is small, e.g., SR291, very large networks (e.g., EDSR, MDSR, RDN) can easily overfit the training set. So the validation performance would decrease after about tens of epochs, if you train with small dataset (e.g., SR291). Powerfull networks should be trained with larger datasets (e.g., DIV2K).

So, when you want to train new models with new training data, what you should modify depends on the data. For large dataset (e.g., DIV2K, Flickr2K), it would be fine to just keep default parameters. For small dataset (e.g., SR291), you can decrease the input patch size (e.g., 24x24), halfLife (e.g.,2e4 iterations, namely 20 epochs), number of RDB (e.g., 10), total training epochs (e.g., 100). These parameters are based on experiments I trained on SR291. Please be careful about overfitting problem on small datasets.

matsuneA212 commented 6 years ago

Hi, Thank you for replying.

However, even though I don't chose the validation dataset from train dataset, the validation results are still good after 1 epoch. e.g.) train dataset -> SR291, validation dataset->5 images (from DIV2K) Can this phenomenon be thought as over-fitting?

yulunzhang commented 6 years ago

Hi,

It makes no difference. The validation set has no effect on the training process, no matter what validation images you use.

Because the image number and size are relatively small in SR291, RDN can be trained sufficiently on SR291 even within 1 or several epochs. In this early stage, there's no over-fitting problem.

Only when we continue to train RDN (with parameters stated in paper) on SR291 with much more epochs, will the model overfit SR291 and the performance also decrease obviously.

If we use parameters as I stated above (For small dataset (e.g., SR291), you can decrease the input patch size (e.g., 24x24), halfLife (e.g.,2e4 iterations, namely 20 epochs), number of RDB (e.g., 10), total training epochs (e.g., 100). ), the overfitting effect can be alleviated.

yulunzhang / RDN

About the validation result when I train just 1 epoch #4