Open bgergely0 opened 3 weeks ago
I noticed that gemv_rowmajor and gemv_colmajor are single threaded.
These functions are used during the token generation phase of many LLMs in candle.
I implemented a multithreaded version of gemv_rowmajor, and seen a +50 speedup in token generation on Microsoft Phi LLM, using 8 cores.
I noticed that gemv_rowmajor and gemv_colmajor are single threaded.
These functions are used during the token generation phase of many LLMs in candle.
I implemented a multithreaded version of gemv_rowmajor, and seen a +50 speedup in token generation on Microsoft Phi LLM, using 8 cores.