xialuxi / arcface-caffe

MIT License
279 stars 124 forks source link

训练出的模型的效果如何? #7

Open ZHAIXINGZHAIYUE opened 5 years ago

ZHAIXINGZHAIYUE commented 5 years ago


xialuxi commented 5 years ago


ZHAIXINGZHAIYUE commented 5 years ago


xuguozhi commented 5 years ago

这个版本的 insightface r100 LFW上面多少?

ZHAIXINGZHAIYUE commented 5 years ago


xialuxi commented 5 years ago

1、你的学习率太大,导致梯度爆炸,请降低你的学习率。 2、训练数据较难以收敛的,请先训练普通softmax作为预训练模型。

shiyuanyin commented 5 years ago

@xialuxi 您好作者,请问一下模型的主干网络是什么昵?

xialuxi commented 5 years ago


shiyuanyin commented 5 years ago

@xialuxi 您好,作者,请问用MS-Celeb-1M,训练的时候出现,cos_theta>1,是不是应该进行,soft_max 先训练,然后加入arcface的损失曾再进行finetune ,训练过程类似于github上,mxnet训练步骤,那样??? 然后最后一层的全连接fc6 的l2, normalize:true 的作用是将w的模进行归一化操作吗?

shiyuanyin commented 5 years ago

@xialuxi 谢谢您的及时回复,

shiyuanyin commented 5 years ago

@ZHAIXINGZHAIYUE 您好,请问您出现这个问题解决了吗,我用了softmax 预训练模型,作为权重model,,然后加入arcface损失层,还是出现这个问题,基础学习率降到0.001,还是出现cos_theta >1

shiyuanyin commented 5 years ago

@xialuxi 作者,您好, 添加损失函数的使用时候, focallosss,我换成了 soft max损失,请问有影响吗? 训练过程中损失值震荡的下降并且伴随 cos _theta >1 ...... 1.50123 cos _theta >1 ...... 4.09701 cos _theta >1 ...... 5.7246 输出?

xialuxi commented 5 years ago

不需要focallosss,就用softmaxwithloss就行了,一旦出现cos _theta >1 ,表明训练失败了。 你可以把你的学习率贴出来看下。

shiyuanyin commented 5 years ago

不需要focallosss,就用softmaxwithloss就行了,一旦出现cos _theta >1 ,表明训练失败了。 你可以把你的学习率贴出来看下。 您好作者,我的caffe里面自带的normalize.cpp,我对比了下,和模型中用到的不一样,把normalize,的实现换掉,之后再训练就是下面这种收敛状态(数据量较少5万,分类输出241,进行实验看看) 没有修改,normalize,出现最下面的输出状态

Iteration 120200, loss = 0.0135795 I1207 09:11:43.073189 39485 solver.cpp:259] Train net output #0: accuracy = 1 I1207 09:11:43.073204 39485 solver.cpp:259] Train net output #1: accuracy-t = 1 I1207 09:11:43.073215 39485 solver.cpp:259] Train net output #2: softmax_loss = 0.0135796 ( 1 = 0.0135796 loss) I1207 09:11:43.073226 39485 sgd_solver.cpp:138] Iteration 120200, lr = 0.001 I1207 09:12:04.167598 39485 solver.cpp:358] Iteration 120250, Testing net (#0) I1207 09:12:04.168140 39485 net.cpp:713] Ignoring source layer fc6_fc6_l2_0_split I1207 09:12:04.168164 39485 net.cpp:713] Ignoring source layer fc6_margin_scale_fc6_margin_scale_0_split I1207 09:12:04.168169 39485 net.cpp:713] Ignoring source layer accuracy-t I1207 09:12:04.168174 39485 net.cpp:713] Ignoring source layer accuracy I1207 09:12:27.587586 39485 solver.cpp:425] Test net output #0: softmax_loss = 19.0434 ( 1 = 19.0434 loss) I1207 09:12:49.055404 39485 solver.cpp:243] Iteration 120300, loss = 0.0020926 I1207 09:12:49.055451 39485 solver.cpp:259] Train net output #0: accuracy = 1 I1207 09:12:49.055461 39485 solver.cpp:259] Train net output #1: accuracy-t = 1 I1207 09:12:49.055481 39485 solver.cpp:259] Train net output #2: softmax_loss = 0.00209277 ( 1 = 0.00209277 loss) I1207 09:12:49.055502 39485 sgd_solver.cpp:138] Iteration 120300, lr = 0.001 I1207 09:13:32.064842 39485 solver.cpp:243] Iteration 120400, loss = 0.00121898 I1207 09:13:32.065098 39485 solver.cpp:259] Train net output #0: accuracy = 1 I1207 09:13:32.065114 39485 solver.cpp:259] Train net output #1: accuracy-t = 1 I1207 09:13:32.065126 39485 solver.cpp:259] Train net output #2: softmax_loss = 0.00121913 ( 1 = 0.00121913 loss) I1207 09:13:32.065137 39485 sgd_solver.cpp:138] Iteration 120400, lr = 0.001 I1207 09:14:15.054962 39485 solver.cpp:243] Iteration 120500, loss = 0.284675 I1207 09:14:15.055132 39485 solver.cpp:259] Train net output #0: accuracy = 0.984375 I1207 09:14:15.055146 39485 solver.cpp:259] Train net output #1: accuracy-t = 1 I1207 09:14:15.055158 39485 solver.cpp:259] Train net output #2: softmax_loss = 0.284675 ( 1 = 0.284675 loss) I1207 09:14:15.055166 39485 sgd_solver.cpp:138] Iteration 120500, lr = 0.001 I1207 09:14:58.101903 39485 solver.cpp:243] Iteration 120600, loss = 0.000106618 I1207 09:14:58.102113 39485 solver.cpp:259] Train net output #0: accuracy = 1 I1207 09:14:58.102126 39485 solver.cpp:259] Train net output #1: accuracy-t = 1 I1207 09:14:58.102138 39485 solver.cpp:259] Train net output #2: softmax_loss = 0.0001068 ( 1 = 0.0001068 loss)

没有修改正则化 I1207 09:22:27.955615 48074 solver.cpp:243] Iteration 0, loss = 46.6276 I1207 09:22:27.955641 48074 solver.cpp:259] Train net output #0: accuracy = 0 I1207 09:22:27.955648 48074 solver.cpp:259] Train net output #1: accuracy-t = 0 I1207 09:22:27.955668 48074 solver.cpp:259] Train net output #2: softmax_loss = 46.6276 ( 1 = 46.6276 loss) I1207 09:22:27.955688 48074 sgd_solver.cpp:138] Iteration 0, lr = 0.01 I1207 09:23:08.346362 48074 solver.cpp:243] Iteration 100, loss = 36.1687 I1207 09:23:08.346518 48074 solver.cpp:259] Train net output #0: accuracy = 0 I1207 09:23:08.346529 48074 solver.cpp:259] Train net output #1: accuracy-t = 0 I1207 09:23:08.346537 48074 solver.cpp:259] Train net output #2: softmax_loss = 36.1687 ( 1 = 36.1687 loss) I1207 09:23:08.346546 48074 sgd_solver.cpp:138] Iteration 100, lr = 0.01 I1207 09:23:51.725335 48074 solver.cpp:243] Iteration 200, loss = 34.4766 I1207 09:23:51.725507 48074 solver.cpp:259] Train net output #0: accuracy = 0 I1207 09:23:51.725520 48074 solver.cpp:259] Train net output #1: accuracy-t = 0.28125 I1207 09:23:51.725529 48074 solver.cpp:259] Train net output #2: softmax_loss = 34.4766 ( 1 = 34.4766 loss) I1207 09:23:51.725538 48074 sgd_solver.cpp:138] Iteration 200, lr = 0.01 I1207 09:24:35.116875 48074 solver.cpp:243] Iteration 300, loss = 34.9371 I1207 09:24:35.117105 48074 solver.cpp:259] Train net output #0: accuracy = 0 I1207 09:24:35.117120 48074 solver.cpp:259] Train net output #1: accuracy-t = 0.328125 I1207 09:24:35.117130 48074 solver.cpp:259] Train net output #2: softmax_loss = 34.9371 ( 1 = 34.9371 loss) I1207 09:24:35.117137 48074 sgd_solver.cpp:138] Iteration 300, lr = 0.01 I1207 09:24:58.669173 48074 cosin_add_m_layer.cpp:52] cos_theta > 1 ** 1.03431 I1207 09:24:59.103677 48074 cosin_add_m_layer.cpp:52] cos_theta > 1 ** 1.25812 I1207 09:24:59.536674 48074 cosin_add_m_layer.cpp:52] cos_theta > 1 ** 1.14654 I1207 09:24:59.536708 48074 cosin_add_m_layer.cpp:52] cos_theta > 1 ** 1.13745 I1207 09:24:59.536715 48074 cosin_add_m_layer.cpp:52] cos_theta > 1 ** 1.14452 I1207 09:24:59.536717 48074 cosin_add_m_layer.cpp:52] cos_theta > 1 ** 1.10017

xialuxi commented 5 years ago

在分类的fc7层, 做了一次normalize,再输入到cos_add_m层的,而cos_theta的值就是fc7的normalize输出,如果cos_theta不是在【0,1】之间,证明训练出问题了。建议学习率从0.001开始。

shiyuanyin commented 5 years ago

在分类的fc7层, 做了一次normalize,再输入到cos_add_m层的,而cos_theta的值就是fc7的normalize输出,如果cos_theta不是在【0,1】之间,证明训练出问题了。建议学习率从0.001开始。

谢谢 xialuxi,我试着从,0.001进行训练,特别感谢您的回复。。。

shiyuanyin commented 5 years ago

在分类的fc7层, 做了一次normalize,再输入到cos_add_m层的,而cos_theta的值就是fc7的normalize输出,如果cos_theta不是在【0,1】之间,证明训练出问题了。建议学习率从0.001开始。

作者您好, 我这边用的是vgg face2的训练集和 8631个身份类别进行训练,收敛的比较慢,而且在一个损失值24,左右一直震荡不再下降了(震荡了5000步停下来的),降低学习率没什么作用。 之前用同样的数据的几万张进行训练,能让损失值下降到很小0.0001,请问这个状态非正常吗?

xialuxi commented 5 years ago

1、caffe这样去训练的确会出现收敛困难,具体原因还未查清楚。 2、可以提供一个可以收敛解决方法,调节m的值,从0.1到0.5一步一步的训练,是可以收敛的。 3、增大iter_size的值,间接增大batch size。

shiyuanyin commented 5 years ago

1、caffe这样去训练的确会出现收敛困难,具体原因还未查清楚。 2、可以提供一个可以收敛解决方法,调节m的值,从0.1到0.5一步一步的训练,是可以收敛的。 3、增大iter_size的值,间接增大batch size。

谢谢您的回复,请问您说的m 从0.1,到0.5,训练,是在0.1时候都训练完,所有步数,然后修改成0.2,进行finetune,然后修改继续finetune, 还是在训练的时候,从0.1开始,到一个地方震荡长时间不下降,时候,增加m,进行finetune

xialuxi commented 5 years ago


shiyuanyin commented 5 years ago

训练完再微调,可以先训练softmax的作为预训练模型。 嗯, 我是把两个归一化去掉训练的 softmax

zhangxiaopang88 commented 5 years ago

您好,请问您的normalize层用的不是am-softmax @shiyuanyin caffe带的normalize层吗,我的情况跟您一样,,,我对比了其他caffe里的normalize层确实不一样,不知道我应该用哪一个normalize层,您能给个参考吗。

wavelet2008 commented 5 years ago

跪求大神 收敛指点 :目前 id=5w 人,m=0.35, loss = 0.390078,acc=0.9918 还是比较差 不然其他训练的效果0.998

Catosine commented 5 years ago

跪求大神 收敛指点 :目前 id=5w 人,m=0.35, loss = 0.390078,acc=0.9918 还是比较差 不然其他训练的效果0.998


shiyuanyin commented 5 years ago

你可以用insightface 检测对齐的方式,处理数据人脸,m 的设置好像不一样

------------------ 原始邮件 ------------------ 发件人: "wavelet2008"notifications@github.com; 发送时间: 2019年6月25日(星期二) 晚上6:30 收件人: "xialuxi/arcface-caffe"arcface-caffe@noreply.github.com; 抄送: "史诗"892781037@qq.com;"Mention"mention@noreply.github.com; 主题: Re: [xialuxi/arcface-caffe] 训练出的模型的效果如何? (#7)

跪求大神 收敛指点 :目前 id=5w 人,m=0.35, loss = 0.390078,acc=0.9918 还是比较差 不然其他训练的效果0.998

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

xiakj commented 4 years ago

@xialuxi 在caffe下跑的arcface识别,识别结果不准确,比python版本的差远了,楼主有什么建议吗