Result of Pre-trained model weight vs result of Train code

youngjae-git commented 3 years ago

Question 1: The problem is that the test result with the provided weight is not the same as the test result after learning the provided train code. In particular, as a result of testing after learning with the train code, the face line in the image is not clear, the image quality is not clear overall, and the change of the face is also different.

Question 2: I made all the hyper-parameters the same and studied about 6 to 7 days with v100. The learning environment was set to gpu:4EA, epoch:400, batch-size: 6. How long did the learning time take for the author's experiment?

The first row below was tested based on the provided weight, and the second row was the training result of the provided train code.

Figure 1. Paper result

Figure 2. Train code result train result

Thank you for writing a good paper in the field of face age.

royorel commented 3 years ago

Hi @YoungJaeK,

I'll admit, this looks weird and should not happen. Especially since you ran it exactly the way it should be run. The only difference between our runs was the GPU (I ran it on 4 RTX2080) but that shouldn't matter at all.

The thing that did happen to me from time to time is that the high learning rate at the beginning can sometimes cause the training to diverge. I suspect that this is the case here. If you still have them, can you share the visdom loss graphs? that can help me determine if that was indeed the case here.

One more thing you can try is to run again with lower initial learning rate (you can try values like 1e-4 to 3e-4). That will stabilize the training, but the aging effect can be slightly less impressive

I'm sorry about the bad experience and wasted time.

youngjae-git commented 3 years ago

Hi @royorel Thank you for quick response. I did it without using visdom loss graphs. I will show you by extracting the value from log.txt.

Question 1: Have you ever done any additional processing on the ffhq dataset when constructing the dataset?

Question 2: I only used about 50,000 pages for learning, wasn't it enough? I'm just thinking of using all 7,000 images for learning. How about?

Question 3: If the batch size is increased, it becomes a problem in terms of learning, so is the initial learning rate minimized?

Last and most important question: 25,000 learning data (one gender) How long does it take to learn per epoch to learn? I take about 40 minutes (RTX2080 4gpu) and 24 (V100 4gpu). Is it the same learning speed as the author?

Thank you for reading the long text

It is never a bad experience nor a wasted time.

Thanks again.

royorel commented 3 years ago

@YoungJaeK The log files are not that interpretable, it is very hard for to infer what went on in training from them, so there's no need to send them.

As for your questions:

There is preprocessing that prunes bad images from the FFHQ-Aging dataset, it's included in the code. (datasets/create_dataset.py). On top of that, the FFHQ-Aging downloading code takes a slightly wider crops from FFHQ's in-the-wild images compared to the 1024x1024 crops in the original repository. The download code can be found at https://github.com/royorel/FFHQ-Aging-Dataset.
We designated 69,000 images for training (before pruning), but we never tested the impact of the training set size on the final outcome, so I'm not sure whether or not it has any impact on the final results.
Increasing the batch size usually helps the training process. If you have enough GPU memory I would recommend doing that. I would lower the learning rate if things get out of hand and training diverges again. The visdom outputs are very helpful on that matter.
The training time sounds reasonable. I think our epoch time on 4 RTX2080 was about 40-45 minutes too.

youngjae-git commented 3 years ago

Thank you @royorel

I have already run "create_dataset.py" The reason for asking this is because I was curious about the folder called "parsings" when configuring the dataset. Is there a reason you created a special folder? Looking at the code, there doesn't seem to be any additional processing, but if you have, please tell me.
Since the train data consists of only 50,000 images, I knew that I was trained with 50,000 images. I will train using 69,000 images. (I'll let you know if there's a difference)

I noticed. The reason I used 50,000 for the training dataset is that I set opt.sort_order to 6 classes as below in training mode. trainning mode option classes is '0-2,3-6,7-9,15-19,30-39,50-69'. I learned by adding other 20-29 and 40-49 classes. because it was not enough to study with each of the above classes. The result of this training is almost the same for the three classes "20-29, 30-39, 40-49".

For this reason '0-2,3-6,7-9,15-19,30-39,50-69' Did you perform interpolation after learning with these 6 classes? Probably, 69,000 images will not be possible if only the above 6 classes are used.

summary question 4: Is there a reason for combining the above 6 classes?

The learning time is too long, so it is difficult to check whether or not you have tried "transfer learning"? If you used "transfer learning", is there any way or weight you can recommend? I would be very grateful if you could tell me how to effectively reduce the learning time.

Then I will contact you again if I have any questions. Thanks again for the quick reply

I hope there are good results for future research.

royorel commented 3 years ago

@YoungJaeK,

If you ran create_dataset.py then you were training on the exact same training set as we did. The parsings are used to mask out the background. We use them in in the dataloader https://github.com/royorel/Lifespan_Age_Transformation_Synthesis/blob/7be4534848e94d9493dfb6da8dfd5724fa5f6766/data/multiclass_unaligned_dataset.py#L116-L121
The 69,000 images is the portion of the original dataset we devote to training before pruning that was a little bit unclear in my previous comment, sorry about that. To be clear, after pruning to 6 age classes, pruning bad images, and splitting into males and females (all done in create_dataset.py) we are left with ~14,000 training images for each gender.

The results for the 20-29, 30-39, 40-49 classes look similar because there's a lot of labeling noise between these classes. Evaluating a person's age is very subjective, even for humans.

We did perform interpolation after learning with 6 classes, that's how we produce the videos in the Colab demo. The main reason to use these 6 classes is as we stated in the paper, they are used as anchors classes to approximate the continuous aging process.

We did not try transfer learning in this project, so it would be hard for me to give any insight regarding that.

youngjae-git commented 3 years ago

@royorel Thanks for your thoughtful reply. I will revise the part you told me to do it well and give feedback. Have a good day!

youngjae-git commented 3 years ago

I have a question. what is meaning of "use moving average network" in training This parameter is important?

self.parser.add_argument('--no_moving_avg', action='store_true', help='if specified, do not use moving average network') self.accumulate(self.g_running, self.netG, decay=0) in train code

royorel commented 3 years ago

Hi @YoungJaeK,

At test time we use a model with a exponential moving average of the weights from training. This is a well known trick for training GANs, first introduced in the "progressive growing of GANs" paper https://arxiv.org/abs/1710.10196

The default setup is to use this trick in training. If for some reason you don't want to use it run training (and test) with the --no_moving_avg flag.

youngjae-git commented 3 years ago

@royorel Thank you!

royorel / Lifespan_Age_Transformation_Synthesis

Result of Pre-trained model weight vs result of Train code #16