Closed zhmd closed 4 years ago
Yes, I didn't used attention masking. But I don't know it will affect the performance. (I suspect it will not make a large differences.)
Thanks a lot for the quick response and for the clarification! Yep the difference is not very significant.
Thanks for sharing the implementation, it's really nice.
I noticed that when you call the mac unit, with LSTM output and image representation, the
question_len
is not passed in, and the attention calculation in control unit seems unaware of the "padding tokens", do I miss something here?Specifically I'm referring to this line: https://github.com/rosinality/mac-network-pytorch/blob/564ca5bbb39bb2bff45e9fa84c898fadd8145d94/model.py#L38
With sentence of varying lengths, the attention should be restricted to the actual sentence length, rather than on the padding tokens. Is that right?