Closed carryyu closed 4 months ago
Hi @carryyu , this is because of the crosswise xor-permuted shared memory layout used in the kernel:
You can check this slide for more details (page 39-43).
Hi @carryyu , this is because of the crosswise xor-permuted shared memory layout used in the kernel:
You can check this slide for more details (page 39-43).
Thanks very much for your reply!
Thanks for your great work!
When I looked at the implementation of the sgmv_flashinfer version, I was very confused about offset += 8. In my understanding, should it be offset += 4?
https://github.com/punica-ai/punica/blob/master/csrc/sgmv_flashinfer/sgmv_flashinfer.cuh#L69 https://github.com/punica-ai/punica/blob/master/csrc/sgmv_flashinfer/sgmv_flashinfer.cuh#L116
Looking forward to your answer.