jadore801120 / attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".
MIT License
8.78k stars 1.97k forks source link

MultiHeadAttention input shape #179

Open Superklez opened 3 years ago

Superklez commented 3 years ago

Is the input shape of MultiHeadAttention [batch_size, sequence_length, embedding_size]? Or is it the same as nn.MultiheadAttention where the input shape must be [sequence_length, batch_size, embedding_size]