Closed jingzhaoou closed 2 weeks ago
What CPU do you have? This plays a big role for small models
Closing this as stale, and probably outdated since a lot has been updated since then. But yes, very slow CPUs (especially virtualized ones in server instances) can be a bottleneck. Working on it. (:
According to the benchmark info on the project frontpage:
I compiled ExLlama V2 from source and ran it on a A100-SXM4-80GB GPU. I got
which seems quite slow compared with the benchmark number.
The text sent to ExLlama V2 is shared here: prompt_llm_proxy_sip_full.txt
The model is
turboderp/Llama2-7B-exl2
with revision4.0bpw
. I wonder if the speed I got is expected or somehow I missed some important steps. Your help is highly appreciated.