punica-ai / punica

Serving multiple LoRA finetuned LLM as one
https://arxiv.org/abs/2310.18547
Apache License 2.0
883 stars 40 forks source link

Confusion about offset += 8 #41

Closed carryyu closed 4 months ago

carryyu commented 4 months ago

Thanks for your great work!

When I looked at the implementation of the sgmv_flashinfer version, I was very confused about offset += 8. In my understanding, should it be offset += 4?

https://github.com/punica-ai/punica/blob/master/csrc/sgmv_flashinfer/sgmv_flashinfer.cuh#L69 https://github.com/punica-ai/punica/blob/master/csrc/sgmv_flashinfer/sgmv_flashinfer.cuh#L116

Looking forward to your answer.

yzh119 commented 4 months ago

Hi @carryyu , this is because of the crosswise xor-permuted shared memory layout used in the kernel:

https://github.com/flashinfer-ai/flashinfer/blob/89a761fc31b795f6d6dd021498dced8bc75db44d/include/flashinfer/permuted_smem.cuh#L59-L61

You can check this slide for more details (page 39-43).

carryyu commented 4 months ago

Hi @carryyu , this is because of the crosswise xor-permuted shared memory layout used in the kernel:

https://github.com/flashinfer-ai/flashinfer/blob/89a761fc31b795f6d6dd021498dced8bc75db44d/include/flashinfer/permuted_smem.cuh#L59-L61

You can check this slide for more details (page 39-43).

Thanks very much for your reply!