haoliuhl / ringattention

Transformers with Arbitrarily Large Context
Apache License 2.0
630 stars 50 forks source link

This work doesn't change kernel, but utilize dependency to compute a whole line? #20

Open ziyuhuang123 opened 3 months ago

ziyuhuang123 commented 3 months ago

Your idea is very excellent and I have starred your repo. I want to check my understanding's correctness:

This paper does not modify the kernel implementation but instead considers that different rows in the sequence dimension of Q are independent. Therefore, it calculates from attention to FFN in one go, which quickly consumes intermediate results and allows for the computation of larger sequence lengths.

Is it correct?