openlm-research / open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
Apache License 2.0
7.36k stars 374 forks source link

how much gpu resources is needed to run this? #58

Closed yxchng closed 1 year ago

gjmulder commented 1 year ago

For inference, and if your GPU supports INT8 then the 7B parameter model will run in about 9GB of VRAM. About twice that for the 13B model. With 4 bit quantization you can half the VRAM requirements.

If you have less than 9GB of VRAM, you can convert it to llama.cpp GGML format and run in on a hybrid CPU/GPU you can fill your GPU VRAM with as many layers as will fit at quantizations between 2bit and 16bit.

yxchng commented 1 year ago

ok. sorry, i was asking about training the model not inferencing.

young-geng commented 1 year ago

The 7b model is trained on 256 TPU v4 chips (around the same compute as 256 A100 chips) for 20 days. The 13b model requires double the compute and 3b model requires half of that.

sanyalsunny111 commented 5 months ago

Could you clarify if the compute details given above are for v1 or v2 of OpeLLaMA models? If they are for v2 please also provide pre-train compute details of v1 models.

young-geng commented 5 months ago

The compute details are the same for the v1 and v2 models.