sarah-quinones / gemm

MIT License
76 stars 12 forks source link

Performance improvement: multithreaded gemv_rowmajor #32

Open bgergely0 opened 3 weeks ago

bgergely0 commented 3 weeks ago

I noticed that gemv_rowmajor and gemv_colmajor are single threaded.

These functions are used during the token generation phase of many LLMs in candle.

I implemented a multithreaded version of gemv_rowmajor, and seen a +50 speedup in token generation on Microsoft Phi LLM, using 8 cores.