Hi,
I have variable length sequences for my task and those sequences are padded to a prespecified maximum length.
How can I ensure that the padded part of the sequence does not contribute to the attention computation ? There is an argument called key_padding_mask in https://github.com/yaohungt/Multimodal-Transformer/blob/master/modules/multihead_attention.py. Any leads on how to use this argument ?
Hi, I have variable length sequences for my task and those sequences are padded to a prespecified maximum length. How can I ensure that the padded part of the sequence does not contribute to the attention computation ? There is an argument called key_padding_mask in https://github.com/yaohungt/Multimodal-Transformer/blob/master/modules/multihead_attention.py. Any leads on how to use this argument ?