issues
search
microsoft
/
torchscale
Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
2.98k
stars
201
forks
source link
Where is the offset implemented in Multi-head dilated attention ?
#104
Open
AshStuff
opened
3 months ago