lucidrains / FLASH-pytorch

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
MIT License
344 stars 24 forks source link

About the "/n" #13

Closed kj01239876 closed 7 months ago

kj01239876 commented 7 months ago

Hi, @lucidrains Thanks for your excellent work. However, I have a small question, why there need to be a "/n", which seems not appear in the paper?

line374 "lin_kv = einsum(f'b g n d, b g n e -> {context_einsum_eq}', lin_k, v) / n"

Looking forward to your reply.