知识蒸馏部分收敛很慢

zma-c-137 / VarGFaceNet

312 stars 84 forks source link

知识蒸馏部分收敛很慢 #10

Open 406747925 opened 5 years ago

406747925 commented 5 years ago

你好，我使用keras重新写了模型并进行训练，使用insightface的resnet100模型作为teacher提取特征，使用softmax-交叉熵和embedding 的 L2 loss，交叉熵loss大约在12左右，L2 loss在0.0038，所以我把L2 loss *2000，训练10epoch ，但是L2 loss下降很慢，只下降到0.0028。

请问你们训练的时候要训练多少个epoch，收敛到何种程度，知识蒸馏部分loss的权重，学习率等怎么设置呢

yuskey commented 4 years ago

Hey could you explain to me the process for knowledge distillation? Do I run the images through the teacher network and the student network in parallel and then determine the loss based on that? If so do I pre-train the teacher network on the data or do I use pre-trained weights (like imagenet)? Are the weights frozen during this process?

ainnn commented 4 years ago

@406747925 ，请问您最好的复现精度是多少？我也是L2 loss到了一定值就下不去了，cfp_fp acc在95%左右。

weilanShi commented 2 years ago

你好，我使用keras重新写了模型并进行训练，使用insightface的resnet100模型作为teacher提取特征，使用softmax-交叉熵和embedding 的 L2 loss，交叉熵loss大约在12左右，L2 loss在0.0038，所以我把L2 loss *2000，训练10epoch ，但是L2 loss下降很慢，只下降到0.0028。

请问你们训练的时候要训练多少个epoch，收敛到何种程度，知识蒸馏部分loss的权重，学习率等怎么设置呢

您好，请问您当时如何写这部分代码的啊，我用angular distill loss 去蒸馏 teacher 和 student, loss几乎一直不变，不知道是哪里出了问题，想跟您请教下