Closed krishnaraj36 closed 2 months ago
@srkreddy1238 : Can you please take a look in this PR.
Thanks @krishnaraj36 , can you send the pr to https://github.com/apache/tvm
Thanks @krishnaraj36 , can you send the pr to https://github.com/apache/tvm
closing with this PR https://github.com/apache/tvm/pull/16932
Enabled new gemv schedule for opencl target, which effectively improves decode performance of mlc-llm LLM models with q4f16_0 format.
Few LLM models Decode performance on Snapdragon Gen-3 android.
Models Baseline Latest improved
Llama-2-7B 10 tok/sec 12.5 tok/sec Qwen-7b 8.5 tok/sec 11 tok/sec