训练模型无法收敛

yjzst commented 4 years ago

我训练了MobileFaceNet但是，效果不好，完全按照您提供的方式对数据集划分，损失最终只收敛在了4左右，达不到您提供的24.pth的那个效果？请问我还有啥疏漏的地方吗

siriusdemon commented 4 years ago

训练了多久?

yjzst commented 4 years ago

150个epoch		杨杰之

邮箱：yjz357406659@126.com |

签名由网易邮箱大师定制

在2020年07月09日 09:58，Sirius Demon 写道：

训练了多久?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

siriusdemon commented 4 years ago

您可以试试权重初始化。用kaiming或者xavier。150个epoch肯定是足够了的。

yjzst commented 4 years ago

好的，谢谢您

	杨杰之

邮箱：yjz357406659@126.com |

签名由网易邮箱大师定制

在2020年07月09日 10:51，Sirius Demon 写道：

您可以试试权重初始化。用kaiming或者xavier。150个epoch肯定是足够了的。

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

yjzst commented 4 years ago

您好，不好意思打扰一下，我用了初始化，结果还是不行，您那边有没有最新的代码可以给我一份吗？

	杨杰之

邮箱：yjz357406659@126.com |

签名由网易邮箱大师定制

在2020年07月09日 10:51，YangJiezhi 写道：好的，谢谢您

	杨杰之

邮箱：yjz357406659@126.com |

签名由网易邮箱大师定制

在2020年07月09日 10:51，Sirius Demon 写道：

您可以试试权重初始化。用kaiming或者xavier。150个epoch肯定是足够了的。

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

siriusdemon commented 4 years ago

这个仓库的代码我后面都没有改过的呀。参考这个，你现在模型的准确度有多高？

yjzst commented 4 years ago

从头训练的话准确率不到20% 那我再试试迁移学习吧

	杨杰之

邮箱：yjz357406659@126.com |

签名由网易邮箱大师定制

在2020年07月10日 10:03，Sirius Demon 写道：

这个仓库的代码我后面都没有改过的呀。参考这个，你现在模型的准确度有多高？

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

siriusdemon commented 4 years ago

你有改动训练的参数吗？我现在还不知道问题在哪

yjzst commented 4 years ago

我没有改模型的参数，直接训练的facemobilenet

	杨杰之

邮箱：yjz357406659@126.com |

签名由网易邮箱大师定制

在2020年07月10日 10:31，Sirius Demon 写道：

你有改动训练的参数吗？我现在还不知道问题在哪

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

siriusdemon commented 4 years ago

这个问题有个诡异。因为模型没有使用权重初始化，所以问题有可能出在这里。但从您的反馈来看，似乎不是。也许您可以加大 batch_size 试试。我正在重新用默认配置训练，稍后看看是否有问题。也看看其他使用者的反馈如何。

yjzst commented 4 years ago

嗯嗯，好的，谢谢您，辛苦啦

	杨杰之

邮箱：yjz357406659@126.com |

签名由网易邮箱大师定制

在2020年07月10日 10:57，Sirius Demon 写道：

这个问题有个诡异。因为模型没有使用权重初始化，所以问题有可能出在这里。但从您的反馈来看，似乎不是。也许您可以加大 batch_size 试试。我正在重新用默认配置训练，稍后看看是否有问题。也看看其他使用者的反馈如何。

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Linsongrong commented 4 years ago

hi ,我也出现了收敛不了的问题。我训练了150个epoch，loss从11开始，一直在8和9之间震荡，收敛不了。是学习率的问题吗。您代码里面的lr=0.1，会不会太大了。

yjzst commented 4 years ago

但是做了递减的呀

	杨杰之

邮箱：yjz357406659@126.com |

签名由网易邮箱大师定制

在2020年07月10日 11:12，Linsongrong 写道：

hi ,我也出现了收敛不了的问题。我训练了150个epoch，loss从11开始，一直在8和9之间震荡，收敛不了。是学习率的问题吗。您代码里面的lr=0.1，会不会太大了。

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

yjzst commented 4 years ago

可以方便加一个您的联系方式吗，相互交流交流，谢谢

	杨杰之

邮箱：yjz357406659@126.com |

签名由网易邮箱大师定制

在2020年07月10日 11:13，YangJiezhi 写道：但是做了递减的呀

	杨杰之

邮箱：yjz357406659@126.com |

签名由网易邮箱大师定制

在2020年07月10日 11:12，Linsongrong 写道：

hi ,我也出现了收敛不了的问题。我训练了150个epoch，loss从11开始，一直在8和9之间震荡，收敛不了。是学习率的问题吗。您代码里面的lr=0.1，会不会太大了。

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Linsongrong commented 4 years ago

您好，我的邮箱是linsongrong1@Outlook.com，您可以通过这个邮箱联系到我。

获取 Outlook for Androidhttps://aka.ms/ghei36

siriusdemon commented 4 years ago

我用默认配置训练，第0个epoch结束的时候，就已经有 80 % 左右的精确率了。

Test Model: checkpoints/0.pth
Accuracy: 0.829
Threshold: 0.481

Test Model: checkpoints/3.pth
Accuracy: 0.884
Threshold: 0.451

Test Model: checkpoints/5.pth
Accuracy: 0.912
Threshold: 0.400

第6个epoch的损失Loss

Epoch 6/150, Loss: 10.049524307250977

项目中提供的24.pth 是训练了 24 个 epoch 之后的权重文件。

yjzst commented 4 years ago

请问损失是多少呢

	杨杰之

邮箱：yjz357406659@126.com |

签名由网易邮箱大师定制

在2020年07月10日 18:29，Sirius Demon 写道：

我用默认配置训练，第0个epoch结束的时候，就已经有 80 % 左右的精确率了。

Test Model: checkpoints/0.pth

Accuracy: 0.829

Threshold: 0.481

Test Model: checkpoints/3.pth

Accuracy: 0.884

Threshold: 0.451

Test Model: checkpoints/5.pth

Accuracy: 0.912

Threshold: 0.400

Loss

Epoch 6/150, Loss: 10.049524307250977

项目中提供的24.pth 是训练了 24 个 epoch 之后的权重文件。

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

siriusdemon commented 4 years ago

训练中如果有足够的GPU，建议可以加大batch_size，训练20个epoch左右。如果要效果好一些的，建议：

用更强的模型
训练更长的时间（需要综合考量 batch_size 和学习率）

yjzst commented 4 years ago

好的，谢谢，辛苦啦

	杨杰之

邮箱：yjz357406659@126.com |

签名由网易邮箱大师定制

在2020年07月10日 18:33，Sirius Demon 写道：

训练中如果有足够的GPU，建议可以加大batch_size，训练20个epoch左右。如果要效果好一些的，建议：

用更强的模型训练更长的时间（需要综合考量 batch_size 和学习率）

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Comedian1926 commented 4 years ago

我也遇到了类似的问题，在尝试过bs=128、256、512，对应学习率0.1、0.01、0.001后在lfw上最好的acc是94.5

yjzst commented 4 years ago

好的，谢谢您，辛苦啦

	杨杰之

邮箱：yjz357406659@126.com |

签名由网易邮箱大师定制

在2020年08月07日 11:09，Comedian1926 写道：

我也遇到了类似的问题，在尝试过bs=128、256、512，对应学习率0.1、0.01、0.001后在lfw上最好的acc是94.5

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

siriusdemon / Build-Your-Own-Face-Model

训练模型无法收敛 #6