punica-ai / punica

Serving multiple LoRA finetuned LLM as one
https://arxiv.org/abs/2310.18547
Apache License 2.0
883 stars 40 forks source link

BGMV performs better than SGMV? #40

Closed ghost closed 4 months ago

ghost commented 5 months ago

I benchmarked various kernels on the A100 using the benchmark script, and it seems that the BGMV kernel outperforms the SGMV kernels for individual requests (bgmv senario). Is this expected?

Screenshot 2024-01-31 at 4 28 27 PM
yzh119 commented 4 months ago

Hi @jsheng-jian , thanks for doing the benchmark and yes it's somewhat expected considering the current SGMV implementation is not optimized for individual requests. A better implementation of SGMV (we are integrating them into flashinfer) may have a similar performance to bgmv but I don't expect sgmv would be faster in this case.