zjz5250 commented 6 years ago

at the beginning，i use m as “type：SINGLE”，and the loss converge，acc can is about 0.98. but when i change the m as “type：QUADRUPLE”，the loss become larger and larger，and acc come down also。 so how to change the type correctly？ hope the help

zuoqing1988 commented 6 years ago

Directly using QUADRUPLE is OK for CASIA. But for other datasets (for example, subsets of ms_celeb_1m), it is hard to converge. When finetuning QUADRUPLE from the result of SINGLE, set lr<=0.01 and base = 10, lambda_min = 10. Try many times, it may converge.

zjz5250 commented 6 years ago

thanks

zjz5250 commented 6 years ago

@zuoqing1988 yes，when i set lr and base as you said，it worked，the loss began to converge obviously。 then，could you tell me the rule about how to change the lr and base？when the type changed。 thx

zuoqing1988 commented 6 years ago

@zjz5250 Finetuning QUADRUPLE from SINGLE, lr = 0.01 base = 10, lambda_min=10, will be OK in very high probability. But if it diverges, try smaller lr. When QUADRUPLE converges with some lambda_min (10), You can also finetune with smaller lambda_min (5, 2 or 1). You can try argument "-snapshot xxx_iter_xxx.caffemodel"。

zjz5250 commented 6 years ago

@zuoqing1988 thanks very much！ I did as you said ，and it worked。 But when i tried to train a model with a big dataset about 5 million person，the loss could not converge again。 can you tell，how to set the lr 、 base ，and the lambda_min？ appreciate！

zuoqing1988 commented 6 years ago

@zjz5250 As far as I know, nobody in this forum successes to train QUADRUPLE on such a big dataset.

zjz5250 commented 6 years ago

@zuoqing1988 so if i train SINGLE 、DOUBLE OR TRIPLE，it can converge？ and how to set the lr 、 base 、 lambda_min

zuoqing1988 commented 6 years ago

@zjz5250 I have trained on cleaned subsets of MS-Celeb-1M, around 80,000 people. SINGLE is easy to converge, but acc on LFW is less than 99%. TRIPLE and QUADRUPLE are hard to converge.

14

The largest dataset I have successed to train with QUADRUPLE includes around 30,000 person, 2 millon images.

zjz5250 commented 6 years ago

@zuoqing1988 sorry，i made a mistake。 i mean that i want to train it with a dataset about 50 000 person，110 000 pics。 i set base_lr=0.001
momentum: 0.9 lr_policy: "multistep" stepvalue: 32000 stepvalue: 48000 stepvalue: 60000 gamma: 0.1 weight_decay:0.0005 and: base: 1000 gamma: 0.12 power: 1 lambda_min: 10 iteration: 0 but even SINGLE can not not converge. can you tell me your details about these parameters，when trained on cleaned subsets of MS-Celeb-1M

zuoqing1988 commented 6 years ago

@zjz5250 I have select a subset of MS_Celeb_1M with 45,971 people, 3645724 images. It converges very fast for SINGLE (initial loss = 10.7, after 10,000 iters loss = 7.0, after 20,000 iters loss is less than 1.5). base_lr: 0.01 lr_policy: "multistep" gamma: 0.1

stepvalue: 160000 stepvalue: 240000 max_iter: 50000

display: 100 momentum: 0.9 weight_decay: 0.0005

base: 1000 gamma: 0.08 power: 1 lambda_min: 5 iteration: 0

zjz5250 commented 6 years ago

@zuoqing1988 hi I am training QUADRUPLE with 1w person dataset. I have tried smaller lr many times, now the acc is about 0.4，but the loss is hard to converge。 my lr is very small now, only 1e-08. my parameters are as follows： base: 10 gamma: 0.08 power: 1 lambda_min: 10 iteration: 0 I have tried lambda_min：5，but once i changed lambda_min from 10 to 5，the acc become to 0 immediately。 what about your acc and loss at last when you trained with CASIA.

zuoqing1988 commented 6 years ago

@zjz5250 If lamda_min = 5, the loss is less than 1.0 after convergency, and acc > 97% for SINGLE, acc > 99% for QUADRUPLE. lr = 1e-08 is too small. The smallest value for I have used is 1e-05. Maybe you should give more images for each person. In my experiments, each person has at least 50 images.

zjz5250 commented 6 years ago

@zuoqing1988 can lamda_min be a large num？for example ： base：50 lamda_min：50

we tried to train like this，found the loss can converge to 0.85 for double，with MS_Celeb_1M dataset

MengWangTHU commented 6 years ago

@zuoqing1988 I tried your proposal of base lambda and lr， it does not work. SO UPSET!! In the layer "MarginInnerProduct", except the parameters like "base, gamma, lambda", there is a special parameter "iteration", and the default value is 0. Do you know what dose this parameter mean, and dose it affects the result of the finetune?

zuoqing1988 commented 6 years ago

@MengWangTHU lambda = max(lambda_min,base(1+gammaiteration)^(-power))

@zjz5250 larger lambda_min lead to lower accuary, but it is easier to converge.

twmht commented 6 years ago

@zuoqing1988

How do you mean "large" for datasets?

"large" for large number of identities, or for large number of images?

wy1iu / sphereface

how to change m #78

14