issues
search
lucidrains
/
local-attention
An implementation of local windowed attention for language modeling
MIT License
383
stars
40
forks
source link
replace shaws with rotary embeddings
#7
Closed
lucidrains
closed
3 years ago