Closed MaximilianPi closed 1 week ago
Hello @MaximilianPi,
Good catch! Would you like to propose a P.R. ? (Additionnaly to your patch, edit the DESCRIPTION to add your name as contributor, and add a line in the NEWS as well to mention the fix)
Yes, I can do that
Hi @dfalbel,
I think there is a small bug in
nnf_multi_head_attention_forward
related to the padding mask. From the padding mask documentation:" (đ,đ)(N,S) where N is the batch size, S is the source sequence length. If a ByteTensor is provided, the non-zero positions will be ignored while the position with the zero positions will be unchanged. If a BoolTensor is provided, the positions with the value of True will be ignored while the position with the value of False will be unchanged. "
Both masks (src and padding) can be passed as either boolean or float (values to be masked = -Inf).
Currently only the src mask is checked for the dtype:
And not the key padding mask:
Thus, float key padding masks are ignored:
So the code (https://github.com/mlverse/torch/blob/d7c49776f331167733c96fb150143cf6f103c005/R/nnf-activation.R#L729C1-L740C4) should be probably changed to :