punica-ai / punica

Serving multiple LoRA finetuned LLM as one
https://arxiv.org/abs/2310.18547
Apache License 2.0
883 stars 40 forks source link

Rotary_pos_emb Miss? #50

Closed chenhongyu2048 closed 2 months ago

chenhongyu2048 commented 2 months ago

I have a question about the Rotary_pos_emb function in llama. I have not find this function in the code, and I can only guess that it has been implemented in _kernels.batch_decode( in the batch_decode function? So if i want to find Rotary_pos_emb, I need to check flashinfer's code?

abcdabcd987 commented 2 months ago

Correct. Position embedding is within the attention kernel. Punica just calls FlashInfer. https://github.com/punica-ai/punica/blob/591b59899f0a20760821785d06b331c8a2e5cb86/csrc/flashinfer_adapter/flashinfer_all.cu#L96

chenhongyu2048 commented 2 months ago

Thank you for your prompt answer! btw, punica is a very rewarding job.

abcdabcd987 commented 2 months ago

Thanks!