Closed zhaiyukun closed 2 years ago
criterion = torch.nn.CrossEntropyLoss(ignore_index=0).to(device) # ignore [GO] token = ignore index 0
why you ignore GO token when setup loss?
Thank you
GO does not have any contribution to the correctness of the prediction. This technique is used in most Attention-based models. The model converges faster with ignore index = 0.
thank you for you reply!
criterion = torch.nn.CrossEntropyLoss(ignore_index=0).to(device) # ignore [GO] token = ignore index 0
why you ignore GO token when setup loss?
Thank you