Open BStudent opened 12 months ago
(INFORMATIONAL) Note for users of PyTorch 2.x that this example function works with PyTorch 1.11 but returns nan Loss values under PyTorch 2.1.
nan
UPDATED:
penalization_visualization()
example_simple_model()
Root cause: nan in predict.log() propagates to nan smoothed values when returned as crit(predict.log(), torch.LongTensor([1])).data.
predict.log()
crit(predict.log(), torch.LongTensor([1])).data
Workaround: replace 0 with 1.0e-10 or similar epsilon-value for plotting (not a real solution due to masking).
0
1.0e-10
# NOTE: return value broken WRT PyTorch 2.1, SEE CODE: def loss(x, crit): """ This function follows the text (by A-T Maintainers): > Label smoothing actually starts to penalize the model if it gets > very confident about a given choice. """ d = x + 3 * 1 # predict = torch.FloatTensor([[0, x / d, 1 / d, 1 / d, 1 / d]]) predict = torch.FloatTensor([[1.0e-10, x / d, 1 / d, 1 / d, 1 / d]]) # <-- workaround # >>> crit(predict.log(), torch.LongTensor([1])).data # Out: tensor(nan) # if torch.__version__ == 2.1 # Out: tensor(0.9514) # if torch.__version__ == 1.11 return crit(predict.log(), torch.LongTensor([1])).data
On the same: #109, #115, #117, https://github.com/harvardnlp/annotated-transformer/pull/115#issuecomment-1729723671.
(INFORMATIONAL) Note for users of PyTorch 2.x that this example function works with PyTorch 1.11 but returns
nan
Loss values under PyTorch 2.1.UPDATED:
penalization_visualization()
demo function.example_simple_model()
demo function appears to work correctly.Root cause:
nan
inpredict.log()
propagates to nan smoothed values when returned ascrit(predict.log(), torch.LongTensor([1])).data
.Workaround: replace
0
with1.0e-10
or similar epsilon-value for plotting (not a real solution due to masking).