fix: calculate du on different batch

sustcsonglin / flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

MIT License

1.24k stars 66 forks source link

Closed uniartisan closed 2 months ago

uniartisan commented 2 months ago

du += du_i.sum(0)

yzhangcs commented 2 months ago

@uniartisan Thanks for your contributions!