harvardnlp / annotated-transformer

An annotated implementation of the Transformer paper.
http://nlp.seas.harvard.edu/annotated-transformer
MIT License
5.77k stars 1.24k forks source link

label smoothing inf err #109

Open jerett opened 1 year ago

jerett commented 1 year ago

when running label smoothing section, I found the code 'crit(x=predict, target=torch.LongTensor([2, 1, 0, 3, 3]))' return inf. I think the var predict shouldn't add log, for log(0) is -inf. and the loss section draws nothing.

20230410-152051 20230410-152158

satya400 commented 1 year ago

Hi jerett - we need the inputs of KLDivLoss to be in log space. Hence we need to apply log() - The -inf issue is because we have zeros in the tensor. So the log() applied to the predict tensor is creating issue with LabelSmoothing().

Hence I propose to use softmax_log() instead of log()

I also raised a pr.

Thanks Satya

alaneuler commented 1 year ago

Same problem here, but i don't think we should use softmax_log instead of log because the predict is already defined as probabilities.

Rather, I changed the predict tensor to:

predict = torch.FloatTensor([[1e-9, x/d - 1e-9, 1/d, 1/d, 1/d]])

to avoid the inf.

The result I get is the same to the example provided:

kuraga commented 2 months ago

On the same: #109, #115, #117, https://github.com/harvardnlp/annotated-transformer/pull/115#issuecomment-1729723671.