Is this correct? I look up the explanation about bmm in official document,it syas that
batch1 and batch2 must be 3-D tensors each containing the same number of matrices.
but as you defined before, the attn_weights is a 2-D shape ,I think here may be some mistakes
Is this correct? I look up the explanation about bmm in official document,it syas that batch1 and batch2 must be 3-D tensors each containing the same number of matrices. but as you defined before, the attn_weights is a 2-D shape ,I think here may be some mistakes