Attention mask in the decoder when using GraphCodeBERT in a translation task

Dear authors,

In the following line,

out = self.decoder(tgt_embeddings,encoder_output,tgt_mask=attn_mask,memory_key_padding_mask=(1-source_mask).bool())

you compute the output of the decoder by supplying 4 params. Please clarify if my understanding is correct:

tgt_mask refers to the mask of the target translation. 1 indicates a non-padding token, 0 otherwise.
memory_key_padding_mask controls how the token in the decoder can attend the token in the encoder. 1 indicates that a decoder token can attend to an encoder token, 0 otherwise)

Regarding the memory_key_padding_mask, is it intended that you allow the decoder tokens to attend to the node tokens in the encoder output?

source_tokens =[tokenizer.cls_token]+code_tokens+[tokenizer.sep_token]
....
# here, you add the node tokens to the source_tokens variable
source_tokens += [x[0] for x in dfg_before]
source_tokens += [x[0] for x in dfg_after]
...
source_mask = [1] * (len(source_tokens))

Here is the link for the snippet above.

Edit: fix some typos

microsoft / CodeBERT

Attention mask in the decoder when using GraphCodeBERT in a translation task #219