randaller / llama-chat

Chat with Meta's LLaMA models at home made easy
GNU General Public License v3.0
833 stars 118 forks source link

Clarify requirements #12

Open vid opened 1 year ago

vid commented 1 year ago

Hi, I am ordering some RAM to work with LLAMA when I take a break in a few weeks. The README for this repo says "64 or better 128 Gb of RAM (192 or 256 would be perfect)". Is this alongside a CUDA card? I have a 3090. I can order up to 192GB of RAM, if it makes a big difference. Will it?

Thanks!

randaller commented 1 year ago

Hi @vid! 30B model uses around 70 Gb of RAM. 7B model fits into 18 Gb. 13B model uses 48 Gb. While models are loading, they need double of this values for a short time (swap file handles it well, then releases). I'm on a 128 Gb and it's a bit not enough to hold 65B model which uses about 140 Gb of RAM.

Totally 128 Gb of RAM, I assume, is ok. Moreover, you can not install more than 128 Gb in a typical desktop, even i9-13900k supports only 128 Gb max. The systems that allow more RAM, immediately costs twice or more.

CUDA card is not so important for this repo, it just runs LLaMA layer by layer, so may be even 1080ti could handle this.

If you have 3090ti, may be it is better to find another repos that would act faster with your cool card and doesn't require so much RAM.

vid commented 1 year ago

Thank you for that response! Some companies, like Gigabyte, now support 48GB DDR5 modules on their LGA1700 models. Crucial currently has 192GB DDR5 7000mhz for $700. So it becomes a lot more practical. I don't mind spending the money if it makes it possible/easier/faster to explore in different directions. And of course, CUDA RAM is very expensive.

breadbrowser commented 1 year ago

Thank you for that response! Some companies, like Gigabyte, now support 48GB DDR5 modules on their LGA1700 models. Crucial currently has 192GB DDR5 7000mhz for $700. So it becomes a lot more practical. I don't mind spending the money if it makes it possible/easier/faster to explore in different directions. And of course, CUDA RAM is very expensive.

What about 64gig ecc server ram

vid commented 1 year ago

If you really need ECC RAM you could buy a server or workstation class system, maybe used. DDR RAM has on-chip error checking. Normally ECC isn't that important, worst case you can run a program twice to see if you get the same result, though it's even more questionable with DL.

tallesairan commented 1 year ago

What would be the amount of gigs of vram to run this model on a GPU? I'm considering buying an A100 with the company's resources :smiling_imp: :trollface:

randaller commented 1 year ago

What would be the amount of gigs of vram to run this model on a GPU?

@tallesairan 1 Gb on a GeForce 710 should be enough :trollface:, just it will be a little bit slowly. This repo feeds layers one by one and the greatest layer is tokenizer which is 500 Mb :)

nopium commented 1 year ago

Is it possible to trade off the lack of RAM in favor of GPU RAM ? Meaning if I have 32Gb RAM and 24Gb of GPU what model size can I run?

randaller commented 1 year ago

@nopium HF version allows GPU offloading, but we still need a lot of RAM.