Closed MaekTec closed 2 years ago
That is a good point to bring up thank you. I haven't realized it yet but it should be KL divergence. I'm a bit baffled because I don't know how I've managed to write that loss in the paper.
To clarify, I've always used KL divergence for the unsupervised loss so the codebase is correct.
Hi, thanks for your great work! I have a question about the consistency loss between the teacher and the student on the unlabeled points. In the code you used the KL-divergence, but in the paper (formula 3) it's something different. For me formula 3 looks like a soft version of cross-entropy, but the minus sign is missing. Or should it be the KL-divergence (https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html) and you forgot some part of it? Or am I missing something?