microsoft / torchscale

Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
2.98k stars 201 forks source link

How to use retention in RetNet for cross-attention? #101

Open yxchng opened 4 months ago