microsoft / torchscale

Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
2.98k stars 201 forks source link

Where is the offset implemented in Multi-head dilated attention ? #104

Open AshStuff opened 3 months ago