Closed chenhongyu2048 closed 2 months ago
Correct. Position embedding is within the attention kernel. Punica just calls FlashInfer. https://github.com/punica-ai/punica/blob/591b59899f0a20760821785d06b331c8a2e5cb86/csrc/flashinfer_adapter/flashinfer_all.cu#L96
Thank you for your prompt answer! btw, punica is a very rewarding job.
Thanks!
I have a question about the Rotary_pos_emb function in llama. I have not find this function in the code, and I can only guess that it has been implemented in
_kernels.batch_decode(
in thebatch_decode
function? So if i want to find Rotary_pos_emb, I need to check flashinfer's code?