Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch
MIT License
620
stars
46
forks
source link
is it a t5 arch or decoder only gpt style arch? #14
T5 is also a decoder-only architecture. The paper uses a decoder-only transformer which this memorizing transformer also seems to be!