why 7B model use 72G GPU?

imoneoi / openchat

OpenChat: Advancing Open-source Language Models with Imperfect Data

https://openchat.team

Apache License 2.0

5.27k stars 399 forks source link

why 7B model use 72G GPU? #111

Open vinnitu opened 1 year ago

vinnitu commented 1 year ago

Maybe I wrong but think memory usage we can calculate as 7,3*4 ~30G, but in nvidia-smi show 72G

after

python3 -m ochat.serving.openai_api_server --model berkeley-nest/Starling-LM-7B-alpha

We can control this?

imoneoi commented 12 months ago

Because vLLM pre-allocates memory as KV cache. You can use python3 -m ochat.serving.openai_api_server --help to check the arguments to control the preallocation behavior.