Closed LiweiPE closed 2 months ago
1B = 1,000,000,000 Model weights are stored in float, 4 bytes. 1000000000 4 / (1024 1024 * 1024) = 3.72 GB Thus, loading a 7B model requires at least 26GB of memory. Considering that DashInfer may reorder weights for best performance during the inference process, it is desirable to have more than twice the memory space (>= 52GB). But this number is not strictly tested.
I would like to know what should it be the minimum requirement specifications for running 7B models. I have this configuration and only can run 1.5B with average throughput 8.1 token/s.