issues
search
sustcsonglin
/
flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
MIT License
1.24k
stars
66
forks
source link
fix: calculate du on different batch
#35
Closed
uniartisan
closed
2 months ago
uniartisan
commented
2 months ago
du += du_i.sum(0)
yzhangcs
commented
2 months ago
@uniartisan Thanks for your contributions!
du += du_i.sum(0)