rosinality / mac-network-pytorch

Memory, Attention and Composition (MAC) Network for CLEVR implemented in PyTorch
MIT License
85 stars 25 forks source link

Padded token is not masked when calculating attention in control unit #7

Closed zhmd closed 4 years ago

zhmd commented 4 years ago

Thanks for sharing the implementation, it's really nice.

I noticed that when you call the mac unit, with LSTM output and image representation, the question_len is not passed in, and the attention calculation in control unit seems unaware of the "padding tokens", do I miss something here?

Specifically I'm referring to this line: https://github.com/rosinality/mac-network-pytorch/blob/564ca5bbb39bb2bff45e9fa84c898fadd8145d94/model.py#L38

With sentence of varying lengths, the attention should be restricted to the actual sentence length, rather than on the padding tokens. Is that right?

rosinality commented 4 years ago

Yes, I didn't used attention masking. But I don't know it will affect the performance. (I suspect it will not make a large differences.)

zhmd commented 4 years ago

Thanks a lot for the quick response and for the clarification! Yep the difference is not very significant.