lucidrains / local-attention

An implementation of local windowed attention for language modeling
MIT License
383 stars 40 forks source link

replace shaws with rotary embeddings #7

Closed lucidrains closed 3 years ago