microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.51k stars 363 forks source link

Seq2Seq CodeBERT - ignore index of CE loss = -1 #122

Closed MichaelFu1998-create closed 2 years ago

MichaelFu1998-create commented 2 years ago

Dear CodeXGLUE team, Thanks for providing such an amazing benchmark! Given the pad token id of CodeBERT tokenizer is "1" I'm wondering why the ignore_index param in the CrossEntropyLoss is set to "-1" in the following code https://github.com/microsoft/CodeXGLUE/blob/d1e2f6ce4ea7d7280a0d21178f36cde5ee830929/Code-Code/code-to-code-trans/code/model.py#L68

Kind regards Michael

celbree commented 2 years ago

This ignore_index=-1 has no effect, because we have use active_loss to mask the padding token in loss calculation here https://github.com/microsoft/CodeXGLUE/blob/d1e2f6ce4ea7d7280a0d21178f36cde5ee830929/Code-Code/code-to-code-trans/code/model.py#L64

MichaelFu1998-create commented 2 years ago

Thanks for your swift response! @celbree Just to confirm that active_loss = target_mask[..., 1:].ne(0).view(-1) == 1 the "0" here in the .ne() is the pad token id, am I right?

celbree commented 2 years ago

Actually the 0 in target_mask is not pad token id, but indicts pad tokens in the same positions in target_ids. Please refer to these lines. https://github.com/microsoft/CodeXGLUE/blob/d1e2f6ce4ea7d7280a0d21178f36cde5ee830929/Code-Code/code-to-code-trans/code/run.py#L133 https://github.com/microsoft/CodeXGLUE/blob/d1e2f6ce4ea7d7280a0d21178f36cde5ee830929/Code-Code/code-to-code-trans/code/run.py#L134

MichaelFu1998-create commented 2 years ago

I see @celbree so anyway the result of active_loss should be something like [True, True, True, False, False] if I have target like [token, token, token, <pad>, <pad>], is this correct?

celbree commented 2 years ago

That's true!

MichaelFu1998-create commented 2 years ago

Thank you very much :) !