Closed qianxinchun closed 7 years ago
I tried "clip_gradients" the solver.prototxt, but it still ended up with 87.3365.
firstly,please change the deplay iteration from 200 to 10 to see how the loss change. seconfly,please reduce the base_lr = 0.0001 or lr = 0.000001 to see the loss thirdly 1、观察数据中是否有异常样本或异常label导致数据读取异常 2、调小初始化权重,以便使softmax输入的feature尽可能变小 3、降低学习率,这样就能减小权重参数的波动范围,从而减小权重变大的可能性。这条也是网上出现较多的方法。 4、如果有BN(batch normalization)层,finetune时最好不要冻结BN的参数,否则数据分布不一致时很容易使输出值变的很大
For CIFAR10, it should be easy to train. If the network diverges, consider decreasing lambda more smoothly. Or simply lower down the difficulty of the loss, i.e. setting a smaller m.
Same problem with @qianxinchun , the network diverges even i set lambda_min=0.5 and m=2. @wy1iu Could you please share your training log(m=4) please?
I believe you could train it using PReLU. Using ReLU may need more parameter tuning. @shenmanmiao
PReLU works well on Cifar10, thanks @wy1iu for your reply.
Hi,thank you for your sharing. I use CASIA-WebFace and A-softmax(Sphereface paper) train the model. The model converged and the accuracy on lfw is 97.5%. It is really hard to achieve the accuracy above 99%, It is really grateful,if you can provide any suggestions. My QQ is 729512518,
@shenmanmiao have you reproduce the result on cifar10? can you share the train_val.prototxt?
I have tried to train myexamples/cifar10/model/cifar_train_test.prototxt with different settings- DOUBLE/TRIPLE/QUADRUPLE, but it always goes like this:
I0327 02:22:00.515635 16177 solver.cpp:228] Iteration 12000, loss = 87.3365 I0327 02:22:00.515707 16177 solver.cpp:244] Train net output #0: lambda = 0.0624753 I0327 02:22:00.515720 16177 solver.cpp:244] Train net output #1: loss = 87.3365 ( 1 = 87.3365 loss) I0327 02:22:00.586127 16177 solver.cpp:244] Train net output #2: mean_length = inf I0327 02:22:00.586163 16177 sgd_solver.cpp:106] Iteration 12000, lr = 0.001 I0327 02:26:54.401607 16177 solver.cpp:228] Iteration 12200, loss = 87.3365 I0327 02:26:54.401752 16177 solver.cpp:244] Train net output #0: lambda = 0.0540467 I0327 02:26:54.401765 16177 solver.cpp:244] Train net output #1: loss = 87.3365 ( 1 = 87.3365 loss) I0327 02:26:54.471928 16177 solver.cpp:244] Train net output #2: mean_length = inf I0327 02:26:54.471937 16177 sgd_solver.cpp:106] Iteration 12200, lr = 0.001 I0327 02:31:48.234402 16177 solver.cpp:228] Iteration 12400, loss = 87.3365 I0327 02:31:48.234601 16177 solver.cpp:244] Train net output #0: lambda = 0.0467769 I0327 02:31:48.234617 16177 solver.cpp:244] Train net output #1: loss = 87.3365 ( 1 = 87.3365 loss) I0327 02:31:48.304947 16177 solver.cpp:244] Train net output #2: mean_length = inf I0327 02:31:48.304958 16177 sgd_solver.cpp:106] Iteration 12400, lr = 0.001 I0327 02:36:42.063432 16177 solver.cpp:228] Iteration 12600, loss = 87.3365 I0327 02:36:42.063588 16177 solver.cpp:244] Train net output #0: lambda = 0.0405035 I0327 02:36:42.063603 16177 solver.cpp:244] Train net output #1: loss = 87.3365 ( 1 = 87.3365 loss) I0327 02:36:42.134166 16177 solver.cpp:244] Train net output #2: mean_length = inf
How to tackle with the problem ?