modelscope / dash-infer

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
Apache License 2.0
129 stars 14 forks source link

CPU specifications for Qwen2-7B models #37

Closed LiweiPE closed 6 days ago

LiweiPE commented 1 month ago

I would like to know what should it be the minimum requirement specifications for running 7B models. I have this configuration and only can run 1.5B with average throughput 8.1 token/s. image image image

laiwenzh commented 3 weeks ago

1B = 1,000,000,000 Model weights are stored in float, 4 bytes. 1000000000 4 / (1024 1024 * 1024) = 3.72 GB Thus, loading a 7B model requires at least 26GB of memory. Considering that DashInfer may reorder weights for best performance during the inference process, it is desirable to have more than twice the memory space (>= 52GB). But this number is not strictly tested.