Closed inspirit closed 1 year ago
https://github.com/lucidrains/local-attention/blob/5ecafbbf13ac44d61c58b6915b1ba1b54a694a72/local_attention/local_attention.py#L167
Dynamic attention bias is [h, i, j] while attention sim is computed from q/k of [(b h), w, n, d] that results in sim [(b h), w, n] since head dim is merged with batch we have an error here
@inspirit oh yes, should be fixed!
https://github.com/lucidrains/local-attention/blob/5ecafbbf13ac44d61c58b6915b1ba1b54a694a72/local_attention/local_attention.py#L167
Dynamic attention bias is [h, i, j] while attention sim is computed from q/k of [(b h), w, n, d] that results in sim [(b h), w, n] since head dim is merged with batch we have an error here