Open jerett opened 1 year ago
Hi jerett - we need the inputs of KLDivLoss to be in log space. Hence we need to apply log() - The -inf issue is because we have zeros in the tensor. So the log() applied to the predict tensor is creating issue with LabelSmoothing().
Hence I propose to use softmax_log() instead of log()
I also raised a pr.
Thanks Satya
Same problem here, but i don't think we should use softmax_log
instead of log
because the predict
is already defined as probabilities.
Rather, I changed the predict tensor to:
predict = torch.FloatTensor([[1e-9, x/d - 1e-9, 1/d, 1/d, 1/d]])
to avoid the inf.
The result I get is the same to the example provided:
On the same: #109, #115, #117, https://github.com/harvardnlp/annotated-transformer/pull/115#issuecomment-1729723671.
when running label smoothing section, I found the code 'crit(x=predict, target=torch.LongTensor([2, 1, 0, 3, 3]))' return inf. I think the var predict shouldn't add log, for log(0) is -inf. and the loss section draws nothing.