Closed MichaelFu1998-create closed 2 years ago
This ignore_index=-1
has no effect, because we have use active_loss
to mask the padding token in loss calculation here https://github.com/microsoft/CodeXGLUE/blob/d1e2f6ce4ea7d7280a0d21178f36cde5ee830929/Code-Code/code-to-code-trans/code/model.py#L64
Thanks for your swift response! @celbree
Just to confirm that
active_loss = target_mask[..., 1:].ne(0).view(-1) == 1
the "0" here in the .ne() is the pad token id, am I right?
Actually the 0
in target_mask
is not pad token id, but indicts pad tokens in the same positions in target_ids
. Please refer to these lines.
https://github.com/microsoft/CodeXGLUE/blob/d1e2f6ce4ea7d7280a0d21178f36cde5ee830929/Code-Code/code-to-code-trans/code/run.py#L133
https://github.com/microsoft/CodeXGLUE/blob/d1e2f6ce4ea7d7280a0d21178f36cde5ee830929/Code-Code/code-to-code-trans/code/run.py#L134
I see @celbree
so anyway the result of active_loss should be something like [True, True, True, False, False]
if I have target like [token, token, token, <pad>, <pad>]
, is this correct?
That's true!
Thank you very much :) !
Dear CodeXGLUE team, Thanks for providing such an amazing benchmark! Given the pad token id of CodeBERT tokenizer is "1" I'm wondering why the ignore_index param in the CrossEntropyLoss is set to "-1" in the following code https://github.com/microsoft/CodeXGLUE/blob/d1e2f6ce4ea7d7280a0d21178f36cde5ee830929/Code-Code/code-to-code-trans/code/model.py#L68
Kind regards Michael