kyegomez / LongNet

Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
https://discord.gg/qUtxnK2NMf
Apache License 2.0
685 stars 64 forks source link

Basemodel usage #2

Closed PriNova closed 1 year ago

PriNova commented 1 year ago

Hey kyegomez,

I'm interested in trying out the implementation. Is it already possible to use a basemodel for this?

kyegomez commented 1 year ago

Hey @PriNova yes, I am working on this now with the torchscale repository!

https://github.com/kyegomez/LongNet/blob/main/LongNet/torchscale/torchscale/architecture/decoder.py

I need help integrating it and providing usage examples!

kyegomez commented 1 year ago

@PriNova I implemented the model architecture and a training script here:

https://github.com/kyegomez/LongNet/blob/0.0.3/LongNet/model.py