In the MetaMultiheadAttention class, for a batch_first case, the code corresponding to pre-transpose is missing. Additionally, as the batch_first argument has been used in nn.MultiHeadAttention from PyTorch 1.9, I updated the README.md accordingly.
BTW, I sincerely appreciate you for releasing this repository. I published a book about Meta-learning in Korea and utilized the Torchmeta library for the example code for the book.
In the
MetaMultiheadAttention
class, for abatch_first
case, the code corresponding to pre-transpose is missing. Additionally, as thebatch_first
argument has been used innn.MultiHeadAttention
from PyTorch 1.9, I updated theREADME.md
accordingly.Refrerence
PyTorch 1.9 PyTorch 2.0