Closed yang0817manman closed 5 years ago
Hi @yang0817manman , Yes, the "alpha_start_value" is related to the training dataset. As for "iteration" and "alpha_start_iter", see here #1 . I recommend you to train sphereface(without MHE) and get a pretrained model at first. And then use this pretrained model to initialize your sphereface+ model. According to my experience, the loss of training msdataset diverges frequently, it not only happens in sphereface+ but also in sphereface. So you need to finetune models carefully. Good luck!
Hi wy1iu, thanks for your sharing codes. I try to train the webface_casia datasets, the lossvalue convergence. but I train the ms1m datasets ,the lossvalue divergence. The params setting as follow: inter_class_param { num_output: 85164 type: AMONG iteration: 16000 alpha_start_iter: 20000 alpha_start_value: 5.3 } Should I modify the params of "iteration:" "alpha_start_iter:" "alpha_start_value: ",can you help me? Is the params of "iteration:" "alpha_start_iter:" "alpha_start_value: " related to the number of trainning datasets? Thanks!