lucidrains / memorizing-transformers-pytorch

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch
MIT License
620 stars 46 forks source link

is it a t5 arch or decoder only gpt style arch? #14

Open brando90 opened 1 year ago

Jayant1234 commented 1 year ago

T5 is also a decoder-only architecture. The paper uses a decoder-only transformer which this memorizing transformer also seems to be!