loss decline, accuracy remains zero

XULU42 commented 4 years ago

I want to use arcface loss to normal classification. But I found that the acc is always nearly 0. I checked the logits, it's nearly -1, and when give the label and add some m, the softmax loss is low. And this is learned by the network!!!!! In the extreme situation, if a network output logit all with -1, then the arcface loss is very low, so network can learn about nothing.

ronghuaiyang commented 4 years ago

how many classes is your data，if the number of classes is small，you can try to reduce the hyper parameter s， make it smaller

XULU42 commented 4 years ago

3ks for your reply. My classes num is 92. And I tried to set s=4, and the acc goes up to 83%,but there is still a great gap to 94.4% which a softmax loss can reach. Should I go deeper to debug the s parameter?

ronghuaiyang commented 4 years ago

the purpose of the arc margin is to get discriminative features，it make sense that the performance of the classification is worse than softmax， and if the 83% is the train set accuracy , you can try on the test set

XULU42 commented 4 years ago

I am sorry to not mention the detail. The 83% acc is on the validation set, and argmax the logits matmuled between the weight and features. As far as i can see, this logit is exact the cosine similarity between the image feature and class center, as the weight interpreted as class center. Just now, I tried with s=16, and at about 1000 steps (with batch 64), I get validation set acc of 82%. I think there is going to be improvement at this time. May we should try something to lower the sensitivity of s. 3ks for your reply!

changgongcheng commented 4 years ago

I meet the same problem

quanvuhust commented 4 years ago

I meet the same problem:

For inference, you should remove the margin m, output is cosine = F.linear(F.normalize(input), F.normalize(self.weight))*self.s.
Try other init method for self.weight: https://pytorch.org/docs/stable/nn.init.html
Fix the embedding feature by l2 normalisation and re-scale it to s (In paper page 2: https://arxiv.org/pdf/1801.07698.pdf) Insight face code line 338: https://github.com/deepinsight/insightface/blob/master/src/train.py https://ideone.com/woWcJu

onlinehuazai commented 2 years ago

how many classes is your data，if the number of classes is small，you can try to reduce the hyper parameter s， make it smaller

100 classes, s set to 30, but acc is 0.

ponykid commented 2 years ago

i face the same question ，so how to fix this problem? please

onlinehuazai commented 2 years ago

s set to be small, like 10

TheSeriousProgrammer commented 1 year ago

trying different values manually can be pain staking , I decided to use optuna to where each trail will be for a max of 2 or few epochs, optuna will try to maximize the output accuracy by modifying s and m values with trial and error . I got good results with that

ronghuaiyang / arcface-pytorch

loss decline, accuracy remains zero #38