Where is the offset implemented in Multi-head dilated attention ? - Githubissues

microsoft / torchscale

Foundation Architecture for (M)LLMs

https://aka.ms/GeneralAI

MIT License

2.98k stars 201 forks source link

Where is the offset implemented in Multi-head dilated attention ? #104

Open AshStuff opened 3 months ago