Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent, and chunkwise forward.
Yes. This implementation is based on HF transformers PretrainedModel class and it's compatible with the Trainer API. You can find examples in the train.py and in the README's training example section.
Is it possible to train a RetNet using Transformers?