Closed ghost closed 4 months ago
Hi @jsheng-jian , thanks for doing the benchmark and yes it's somewhat expected considering the current SGMV implementation is not optimized for individual requests. A better implementation of SGMV (we are integrating them into flashinfer) may have a similar performance to bgmv but I don't expect sgmv would be faster in this case.
I benchmarked various kernels on the A100 using the benchmark script, and it seems that the BGMV kernel outperforms the SGMV kernels for individual requests (bgmv senario). Is this expected?