Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch
MIT License
620
stars
46
forks
source link
Update train.py to correct implementation of val loss calculation #8