wy1iu / LargeMargin_Softmax_Loss

Implementation for <Large-Margin Softmax Loss for Convolutional Neural Networks> in ICML'16.
Other
350 stars 115 forks source link

Difficult to train with LargeMargin_Softmax_Loss on cifar10 #10

Closed qianxinchun closed 7 years ago

qianxinchun commented 7 years ago

I have tried to train myexamples/cifar10/model/cifar_train_test.prototxt with different settings- DOUBLE/TRIPLE/QUADRUPLE, but it always goes like this:

I0327 02:22:00.515635 16177 solver.cpp:228] Iteration 12000, loss = 87.3365 I0327 02:22:00.515707 16177 solver.cpp:244] Train net output #0: lambda = 0.0624753 I0327 02:22:00.515720 16177 solver.cpp:244] Train net output #1: loss = 87.3365 ( 1 = 87.3365 loss) I0327 02:22:00.586127 16177 solver.cpp:244] Train net output #2: mean_length = inf I0327 02:22:00.586163 16177 sgd_solver.cpp:106] Iteration 12000, lr = 0.001 I0327 02:26:54.401607 16177 solver.cpp:228] Iteration 12200, loss = 87.3365 I0327 02:26:54.401752 16177 solver.cpp:244] Train net output #0: lambda = 0.0540467 I0327 02:26:54.401765 16177 solver.cpp:244] Train net output #1: loss = 87.3365 ( 1 = 87.3365 loss) I0327 02:26:54.471928 16177 solver.cpp:244] Train net output #2: mean_length = inf I0327 02:26:54.471937 16177 sgd_solver.cpp:106] Iteration 12200, lr = 0.001 I0327 02:31:48.234402 16177 solver.cpp:228] Iteration 12400, loss = 87.3365 I0327 02:31:48.234601 16177 solver.cpp:244] Train net output #0: lambda = 0.0467769 I0327 02:31:48.234617 16177 solver.cpp:244] Train net output #1: loss = 87.3365 ( 1 = 87.3365 loss) I0327 02:31:48.304947 16177 solver.cpp:244] Train net output #2: mean_length = inf I0327 02:31:48.304958 16177 sgd_solver.cpp:106] Iteration 12400, lr = 0.001 I0327 02:36:42.063432 16177 solver.cpp:228] Iteration 12600, loss = 87.3365 I0327 02:36:42.063588 16177 solver.cpp:244] Train net output #0: lambda = 0.0405035 I0327 02:36:42.063603 16177 solver.cpp:244] Train net output #1: loss = 87.3365 ( 1 = 87.3365 loss) I0327 02:36:42.134166 16177 solver.cpp:244] Train net output #2: mean_length = inf

How to tackle with the problem ?

xqpinitial commented 7 years ago

http://blog.csdn.net/yan_joy/article/details/53608519

qianxinchun commented 7 years ago

I tried "clip_gradients" the solver.prototxt, but it still ended up with 87.3365.

xqpinitial commented 7 years ago

firstly,please change the deplay iteration from 200 to 10 to see how the loss change. seconfly,please reduce the base_lr = 0.0001 or lr = 0.000001 to see the loss thirdly 1、观察数据中是否有异常样本或异常label导致数据读取异常 2、调小初始化权重,以便使softmax输入的feature尽可能变小 3、降低学习率,这样就能减小权重参数的波动范围,从而减小权重变大的可能性。这条也是网上出现较多的方法。 4、如果有BN(batch normalization)层,finetune时最好不要冻结BN的参数,否则数据分布不一致时很容易使输出值变的很大

wy1iu commented 7 years ago

For CIFAR10, it should be easy to train. If the network diverges, consider decreasing lambda more smoothly. Or simply lower down the difficulty of the loss, i.e. setting a smaller m.

shenmanmiao commented 7 years ago

Same problem with @qianxinchun , the network diverges even i set lambda_min=0.5 and m=2. @wy1iu Could you please share your training log(m=4) please?

wy1iu commented 7 years ago

I believe you could train it using PReLU. Using ReLU may need more parameter tuning. @shenmanmiao

shenmanmiao commented 7 years ago

PReLU works well on Cifar10, thanks @wy1iu for your reply.

billhyde commented 7 years ago

Hi,thank you for your sharing. I use CASIA-WebFace and A-softmax(Sphereface paper) train the model. The model converged and the accuracy on lfw is 97.5%. It is really hard to achieve the accuracy above 99%, It is really grateful,if you can provide any suggestions. My QQ is 729512518,

yfllllll commented 6 years ago

@shenmanmiao have you reproduce the result on cifar10? can you share the train_val.prototxt?