mlc-ai / relax

Apache License 2.0
137 stars 69 forks source link

[DLIGHT][GEMV] Enable gemv schedule for adreno #319

Closed krishnaraj36 closed 2 months ago

krishnaraj36 commented 2 months ago

Enabled new gemv schedule for opencl target, which effectively improves decode performance of mlc-llm LLM models with q4f16_0 format.

Few LLM models Decode performance on Snapdragon Gen-3 android.

Models Baseline Latest improved

Llama-2-7B 10 tok/sec 12.5 tok/sec Qwen-7b 8.5 tok/sec 11 tok/sec

krishnaraj36 commented 2 months ago

@srkreddy1238 : Can you please take a look in this PR.

tqchen commented 2 months ago

Thanks @krishnaraj36 , can you send the pr to https://github.com/apache/tvm

krishnaraj36 commented 2 months ago

Thanks @krishnaraj36 , can you send the pr to https://github.com/apache/tvm

closing with this PR https://github.com/apache/tvm/pull/16932