Open yoinked-h opened 1 year ago
cc @ArthurZucker @younesbelkada
p.s. if google offered any bigger TPU's for TRC; i could train retnet-3b (the point at which retnet is better than regular transformers), but as of now; theres retnet_base (small) and retnet_medium (ill upload it when it gets good)
I am wondering if the original authors released the trained models?
as far as i know, no official pretrained models were released by microsoft; but the training code is on the torchscale repo, so thats how i am training the models
Cool model! But as long as we don't have official/ very good pretraining checkpoints, not really anything we can do!
ah, understood, i'll try to get a good checkpoint; but for now, i assume i can close this and reopen when it finishes training
oops
https://huggingface.co/parsee-mizuhashi/retnet/tree/main
trained it on 1m steps, loss is around 4.2
, hope this is good enough for some inference code
My recommendation would be to put the model on the hub following this tutorial, which will help having a working code without going trough the hassle of all the review process! Then if the models is highly requested/has as lot of usage or has official released checkpoints then we'll add it in transformers! Does that make sens for you @yoinked-h ? 🤗
If you implement it or link some useful code for training we could provide some computing power
My recommendation would be to put the model on the hub following this tutorial, which will help having a working code without going trough the hassle of all the review process! Then if the models is highly requested/has as lot of usage or has official released checkpoints then we'll add it in transformers! Does that make sens for you @yoinked-h ? 🤗
yeah, i'll try to make the custom model scripts and push them to the hub
If you implement it or link some useful code for training we could provide some computing power
the training code is kind of buggy (doesnt work with TPU accelerate) but here, i also have a shell script which does most of the work for setup->training
I started an training of small (around 300m params) model with german data. Its HF compatible and should push the code to the hub too.
300m and 1300m models are training After finding a bug in learning rate scheduling the loss is decreasing again. The text is grammatical okay but doesn't make sense right now. Looking forward to the new run 😁 Will push the weights and code to the hub on Friday I think.
https://huggingface.co/flozi00/RetNet-300m-German
Maybe I find some time to train larger models, for example 7b, when i am not ill anymore
https://huggingface.co/papers/2307.08621#64bff688661694889faecdb2
Will be waiting for the release from Microsoft
Hello everyone, Is there any better pre-trained model available now?
hey @yoinked-h , can you further assist me about how you manage to train a retnet model? I cant seem to manage it ? If possible can you share a python file or notebook ? Thank you so much in advance
I publish a RetNet model for study, you can try it : https://huggingface.co/wac81/toy_retnet_1.3b_pretrain
Hello everyone, Is there any better pre-trained model available now?
Model description
RetNet / Retentive Networks is a new model archetype released by microsoft; the research paper is here. As of now, there is one model for retnet; made by me; which is undertrained (
loss=8
!) and I am trying to make a second model on a larger arch.Open source status
Provide useful links for the implementation
commit that has retnet training @donglixp was the main author for commit and cited on the paper all code is licensed under MIT, including model weights