yaohungt / Multimodal-Transformer

[ACL'19] [PyTorch] Multimodal Transformer
MIT License
818 stars 152 forks source link

Padding mask #34

Open digbose92 opened 3 years ago

digbose92 commented 3 years ago

Hi, I have variable length sequences for my task and those sequences are padded to a prespecified maximum length. How can I ensure that the padded part of the sequence does not contribute to the attention computation ? There is an argument called key_padding_mask in https://github.com/yaohungt/Multimodal-Transformer/blob/master/modules/multihead_attention.py. Any leads on how to use this argument ?