Open junior-zsy opened 1 year ago
i have same wishes, page attention, fused multihead attention, flash attention...
i discussed this with the author of flm, he said page attention will be added in the future
we can also trace this post : https://github.com/ztxz16/fastllm/issues/150
or we can develop it ourselves
I hope this message finds you well. First off, thank you for providing such an incredible project on large model inference. I've been utilizing it extensively and it's been instrumental for many of my tasks.
However, I have recently been working with two attention mechanisms, namely, FlashAttention and Multi-Query Attention. These mechanisms have shown to be highly efficient and effective in various tasks, enhancing the capability of transformer models even further.