syncdoth / RetNet

Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent, and chunkwise forward.
MIT License
226 stars 24 forks source link

Training using HF Transformers #3

Closed nebulatgs closed 1 year ago

nebulatgs commented 1 year ago

Is it possible to train a RetNet using Transformers?

syncdoth commented 1 year ago

Yes. This implementation is based on HF transformers PretrainedModel class and it's compatible with the Trainer API. You can find examples in the train.py and in the README's training example section.