Padded token is not masked when calculating attention in control unit

zhmd commented 4 years ago

Thanks for sharing the implementation, it's really nice.

I noticed that when you call the mac unit, with LSTM output and image representation, the question_len is not passed in, and the attention calculation in control unit seems unaware of the "padding tokens", do I miss something here?

Specifically I'm referring to this line: https://github.com/rosinality/mac-network-pytorch/blob/564ca5bbb39bb2bff45e9fa84c898fadd8145d94/model.py#L38

With sentence of varying lengths, the attention should be restricted to the actual sentence length, rather than on the padding tokens. Is that right?

rosinality commented 4 years ago

Yes, I didn't used attention masking. But I don't know it will affect the performance. (I suspect it will not make a large differences.)

zhmd commented 4 years ago

Thanks a lot for the quick response and for the clarification! Yep the difference is not very significant.

rosinality / mac-network-pytorch

Padded token is not masked when calculating attention in control unit #7