lucidrains / local-attention

An implementation of local windowed attention for language modeling
MIT License
383 stars 40 forks source link

Wrong shape for attention bias vs sim tensor #15

Closed inspirit closed 1 year ago

inspirit commented 1 year ago

https://github.com/lucidrains/local-attention/blob/5ecafbbf13ac44d61c58b6915b1ba1b54a694a72/local_attention/local_attention.py#L167

Dynamic attention bias is [h, i, j] while attention sim is computed from q/k of [(b h), w, n, d] that results in sim [(b h), w, n] since head dim is merged with batch we have an error here

lucidrains commented 1 year ago

@inspirit oh yes, should be fixed!