huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.35k stars 26.86k forks source link

RetNet model support #25243

Open yoinked-h opened 1 year ago

yoinked-h commented 1 year ago

Model description

RetNet / Retentive Networks is a new model archetype released by microsoft; the research paper is here. As of now, there is one model for retnet; made by me; which is undertrained (loss=8!) and I am trying to make a second model on a larger arch.

Open source status

Provide useful links for the implementation

commit that has retnet training @donglixp was the main author for commit and cited on the paper all code is licensed under MIT, including model weights

amyeroberts commented 1 year ago

cc @ArthurZucker @younesbelkada

yoinked-h commented 1 year ago

p.s. if google offered any bigger TPU's for TRC; i could train retnet-3b (the point at which retnet is better than regular transformers), but as of now; theres retnet_base (small) and retnet_medium (ill upload it when it gets good)

ydshieh commented 1 year ago

I am wondering if the original authors released the trained models?

yoinked-h commented 1 year ago

as far as i know, no official pretrained models were released by microsoft; but the training code is on the torchscale repo, so thats how i am training the models

ArthurZucker commented 1 year ago

Cool model! But as long as we don't have official/ very good pretraining checkpoints, not really anything we can do!

yoinked-h commented 1 year ago

ah, understood, i'll try to get a good checkpoint; but for now, i assume i can close this and reopen when it finishes training

yoinked-h commented 1 year ago

oops

yoinked-h commented 1 year ago

https://huggingface.co/parsee-mizuhashi/retnet/tree/main trained it on 1m steps, loss is around 4.2, hope this is good enough for some inference code

ArthurZucker commented 1 year ago

My recommendation would be to put the model on the hub following this tutorial, which will help having a working code without going trough the hassle of all the review process! Then if the models is highly requested/has as lot of usage or has official released checkpoints then we'll add it in transformers! Does that make sens for you @yoinked-h ? 🤗

flozi00 commented 1 year ago

If you implement it or link some useful code for training we could provide some computing power

yoinked-h commented 1 year ago

My recommendation would be to put the model on the hub following this tutorial, which will help having a working code without going trough the hassle of all the review process! Then if the models is highly requested/has as lot of usage or has official released checkpoints then we'll add it in transformers! Does that make sens for you @yoinked-h ? 🤗

yeah, i'll try to make the custom model scripts and push them to the hub

If you implement it or link some useful code for training we could provide some computing power

the training code is kind of buggy (doesnt work with TPU accelerate) but here, i also have a shell script which does most of the work for setup->training

flozi00 commented 1 year ago

I started an training of small (around 300m params) model with german data. Its HF compatible and should push the code to the hub too.

flozi00 commented 1 year ago

300m and 1300m models are training After finding a bug in learning rate scheduling the loss is decreasing again. The text is grammatical okay but doesn't make sense right now. Looking forward to the new run 😁 Will push the weights and code to the hub on Friday I think.

flozi00 commented 1 year ago

https://huggingface.co/flozi00/RetNet-300m-German

Maybe I find some time to train larger models, for example 7b, when i am not ill anymore

flozi00 commented 1 year ago

https://huggingface.co/papers/2307.08621#64bff688661694889faecdb2

Will be waiting for the release from Microsoft

zzczzc20 commented 1 year ago

Hello everyone, Is there any better pre-trained model available now?

risedangel commented 1 year ago

hey @yoinked-h , can you further assist me about how you manage to train a retnet model? I cant seem to manage it ? If possible can you share a python file or notebook ? Thank you so much in advance

wac81 commented 5 months ago

I publish a RetNet model for study, you can try it : https://huggingface.co/wac81/toy_retnet_1.3b_pretrain

Hello everyone, Is there any better pre-trained model available now?