Closed xiaomingdaren123 closed 5 years ago
I don't think L2 normalization has a big impact on the training.
If the loss is equal to the margin, this means that training is collapsing (all embeddings are collapsing into a single point), so you may want to reduce the learning rate.
@omoindrot Thank you for your reply,the learning rate is set to 0.0001,but training collapsing still happen。If I set a smaller learning rate,the network is hard to fall below the margin。But add L2 normalization layer to the output of network will not cause these problems.(I did not add batch normalization to the network and adopt batch hard training strategy)
Ok good to know !
Hi,omoindrot Thanks for your code,I meet some question, loss value is approximate of margin,i found that the distance is close to 0,I don't know how it was caused. Does the output of the network need to be L2 normalized,?what is the role of L2 normalization?How to set the value of margin,if i don't use L2 normalized? thanks