Open youngjae-git opened 3 years ago
Hi @YoungJaeK,
I'll admit, this looks weird and should not happen. Especially since you ran it exactly the way it should be run. The only difference between our runs was the GPU (I ran it on 4 RTX2080) but that shouldn't matter at all.
The thing that did happen to me from time to time is that the high learning rate at the beginning can sometimes cause the training to diverge. I suspect that this is the case here. If you still have them, can you share the visdom loss graphs? that can help me determine if that was indeed the case here.
One more thing you can try is to run again with lower initial learning rate (you can try values like 1e-4 to 3e-4). That will stabilize the training, but the aging effect can be slightly less impressive
I'm sorry about the bad experience and wasted time.
Hi @royorel Thank you for quick response. I did it without using visdom loss graphs. I will show you by extracting the value from log.txt.
Question 1: Have you ever done any additional processing on the ffhq dataset when constructing the dataset?
Question 2: I only used about 50,000 pages for learning, wasn't it enough? I'm just thinking of using all 7,000 images for learning. How about?
Question 3: If the batch size is increased, it becomes a problem in terms of learning, so is the initial learning rate minimized?
Last and most important question: 25,000 learning data (one gender) How long does it take to learn per epoch to learn? I take about 40 minutes (RTX2080 4gpu) and 24 (V100 4gpu). Is it the same learning speed as the author?
Thank you for reading the long text
It is never a bad experience nor a wasted time.
Thanks again.
@YoungJaeK The log files are not that interpretable, it is very hard for to infer what went on in training from them, so there's no need to send them.
As for your questions:
Thank you @royorel
I have already run "create_dataset.py" The reason for asking this is because I was curious about the folder called "parsings" when configuring the dataset. Is there a reason you created a special folder? Looking at the code, there doesn't seem to be any additional processing, but if you have, please tell me.
Since the train data consists of only 50,000 images, I knew that I was trained with 50,000 images. I will train using 69,000 images. (I'll let you know if there's a difference)
I noticed. The reason I used 50,000 for the training dataset is that I set opt.sort_order to 6 classes as below in training mode. trainning mode option classes is '0-2,3-6,7-9,15-19,30-39,50-69'. I learned by adding other 20-29 and 40-49 classes. because it was not enough to study with each of the above classes. The result of this training is almost the same for the three classes "20-29, 30-39, 40-49".
For this reason '0-2,3-6,7-9,15-19,30-39,50-69' Did you perform interpolation after learning with these 6 classes? Probably, 69,000 images will not be possible if only the above 6 classes are used.
summary question 4: Is there a reason for combining the above 6 classes?
Then I will contact you again if I have any questions. Thanks again for the quick reply
I hope there are good results for future research.
@YoungJaeK,
If you ran create_dataset.py
then you were training on the exact same training set as we did. The parsings are used to mask out the background. We use them in in the dataloader https://github.com/royorel/Lifespan_Age_Transformation_Synthesis/blob/7be4534848e94d9493dfb6da8dfd5724fa5f6766/data/multiclass_unaligned_dataset.py#L116-L121
The 69,000 images is the portion of the original dataset we devote to training before pruning that was a little bit unclear in my previous comment, sorry about that. To be clear, after pruning to 6 age classes, pruning bad images, and splitting into males and females (all done in create_dataset.py
) we are left with ~14,000 training images for each gender.
The results for the 20-29, 30-39, 40-49 classes look similar because there's a lot of labeling noise between these classes. Evaluating a person's age is very subjective, even for humans.
We did perform interpolation after learning with 6 classes, that's how we produce the videos in the Colab demo. The main reason to use these 6 classes is as we stated in the paper, they are used as anchors classes to approximate the continuous aging process.
@royorel Thanks for your thoughtful reply. I will revise the part you told me to do it well and give feedback. Have a good day!
I have a question. what is meaning of "use moving average network" in training This parameter is important?
self.parser.add_argument('--no_moving_avg', action='store_true', help='if specified, do not use moving average network') self.accumulate(self.g_running, self.netG, decay=0) in train code
Hi @YoungJaeK,
At test time we use a model with a exponential moving average of the weights from training. This is a well known trick for training GANs, first introduced in the "progressive growing of GANs" paper https://arxiv.org/abs/1710.10196
The default setup is to use this trick in training. If for some reason you don't want to use it run training (and test) with
the --no_moving_avg
flag.
@royorel Thank you!
Result of Pre-trained model weight vs result of Train code
Question 1: The problem is that the test result with the provided weight is not the same as the test result after learning the provided train code. In particular, as a result of testing after learning with the train code, the face line in the image is not clear, the image quality is not clear overall, and the change of the face is also different.
Question 2: I made all the hyper-parameters the same and studied about 6 to 7 days with v100. The learning environment was set to gpu:4EA, epoch:400, batch-size: 6. How long did the learning time take for the author's experiment?
The first row below was tested based on the provided weight, and the second row was the training result of the provided train code.
Figure 1. Paper result
Figure 2. Train code result
Thank you for writing a good paper in the field of face age.