Mask problem in computing attention values

pcyin / pytorch_nmt

A neural machine translation model in PyTorch

117 stars 25 forks source link

Mask problem in computing attention values #5

Closed Junjieli0704 closed 6 years ago

Junjieli0704 commented 6 years ago

Hi, this is a nice repository to learn nmt and seq2seq models! I have a problem about the mask problem in computing attention values.

The function in nmt.py to compute attention values is: _def dot_prod_attention(self, h_t, src_encoding, src_encoding_attlinear, mask=None)

When you call _dot_prodattention to compute attention values, you always use the default mask value(None). Doesn't it need to set mask value when computing attention values?

Thank you !

pcyin commented 6 years ago

Thanks for your interest in this repo! In my current implementation all source sentences in a batch have the same length, so there is no need to apply masking on source sentences :) Nevertheless I implemented the dot_prod_attention function in a rather general fashion to allow for source masking, so you can easily modify the codebase to have source sentences of different lengths in a batch.

Sorry for the confusion!

Junjieli0704 commented 6 years ago

Thank you very much!