xai-org / grok-1

Grok open release
Apache License 2.0
49.46k stars 8.33k forks source link

Hardware requirements #62

Open Konard opened 6 months ago

Konard commented 6 months ago

What are minimum and recommended hardware requirements to run the model and to do training?

  1. How much GPU Memory (VRAM) is required?
  2. How much RAM is required?
  3. What GPUs are recommended?
  4. What CPUs are recommended?
  5. May it be run on single machine or cluster is required?

At https://huggingface.co/xai-org/grok-1 it is written:

Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.

What multi-GPU machine mean exactly?

Also looks like that model weights itself are about 296.38 GB, so it would require more than 300 GB of storage. Should it be SSD or HDD will be enough? Does that mean that it also require minimum of 300 GB VRAM?

And at README.md in this repository it is written that: https://github.com/xai-org/grok-1/blob/e50578b5f50e4c10c6e7cff31af1ef2bedb3beb8/README.md?plain=1#L17

What machine with enough GPU memory mean exactly?

Please specify the answer in README.md on both GitHub and Hugging Face, it will save lots of time for people. This answer is required for users to decide is it feasible to run the model using available resources to them.

That would be also useful to keep track of tested hardware, so users will know in advance wherever it is possible to use their hardware without additional problems.

Update 2024-03-19: Looks like we have a confirmation that 8 GPUs are required.

bot66 commented 6 months ago

many many gpus.

yhyu13 commented 6 months ago

GH200 datacenter rig which cost millions ;)

dabeckham commented 6 months ago

If you don't know what 300 GB of VRAM is, you have a lot to learn before trying to run this model.

You need 8 of these.... https://www.amazon.com/NVIDIA-Ampere-Passive-Double-Height/dp/B09N95N3PW

Martinho0330 commented 6 months ago

Is Jetson AGX Orin Developer Kit capable of running this monster model?

Nick-G1984 commented 6 months ago

Question is whether the hardware requirements is an issue that can be fixed? Otherwise in my eyes it would seem that making it "Open source" only means making it available to businesses or in rare cases, individuals with the hardware to run it. Or was it just a publicity thing in relation to the OpenAi lawsuit...

hunter-xue commented 6 months ago

looks like magnet download file is soooooo big, 256GB and 2.2% downloaded in progress.

MuhammadShifa commented 6 months ago

@dabeckham I don't know why people are staring it, no one has tested. Just goes viral. This is not release for us but only for Google, Microsoft and AWS etc. Who can provide 300 + GPU memory???

xiaosagemisery commented 6 months ago

since rtx4090 only has 24GB vram...

david-jk commented 6 months ago

@MuhammadShifa It will be possible to run this on the CPU once support is added to llama.cpp and someone releases 4-bit (or lower) quantized weights. You will need around 256 GB RAM, which is a lot more reasonable for a normal user than needing this much VRAM.

davidearlyoung commented 6 months ago

This looks interesting: https://github.com/xai-org/grok-1/issues/42 Speculations could be 96GB of vram if the model can be arranged to work at/with 4-bit quantization for the ggml library. Not sure how nicely ggml plays with jax though.

Konard commented 6 months ago

Looks like 8 * A100 GPUs with 80 GB VRAM each are not enough by themselves either: https://github.com/xai-org/grok-1/issues/125

Konard commented 6 months ago

looks like magnet download file is soooooo big, 256GB and 2.2% downloaded in progress.

Screenshot_20240319_005452

@hunter-xue, did you mean 296 GB?

jussker commented 6 months ago

ran it on 8x a100 80g with the code in this repo(no modification, i just added a loop to get input from terminal), using 524GB of vram during single batch inference with nearly no context(10~100 tokens input), speed is only 7 tokens per second

168

SavvyClique commented 6 months ago

Can run it on cloud hardware?

davidearlyoung commented 6 months ago

This looks interesting: #42 Speculations could be 96GB of vram if the model can be arranged to work at/with 4-bit quantization for the ggml library. Not sure how nicely ggml plays with jax though.

Just found this in relations to my last post on this thread: https://huggingface.co/eastwind/grok-1-hf-4bit

Looks to be about 90.2 GB on file if you add up the safetensor file shards from the hugging face eastwind repo. There may be more overhead that requires a bit more memory to use for inference. But promising all the same. I hope that grok-1 quants to 4 bit very well. Fingers crossed.

stoic-analytics commented 6 months ago
Bildschirmfoto 2024-03-28 um 10 41 04

Following this calculation from https://www.substratus.ai/blog/calculating-gpu-memory-for-llm/ you would need

I've just stumbled upon this article from VMWare where you can open-source models in the cloud(s): https://www.vmware.com/products/vsphere/ai-ml.html#democratize

davidearlyoung commented 6 months ago

Bildschirmfoto 2024-03-28 um 10 41 04 Following this calculation from https://www.substratus.ai/blog/calculating-gpu-memory-for-llm/ you would need

  • 94,2 GB for the 4Bit Model
  • 188,4 GB for the 8Bit Model

I've just stumbled upon this article from VMWare where you can open-source models in the cloud(s): https://www.vmware.com/products/vsphere/ai-ml.html#democratize

Rough calculations is what I see this formula is for. Which is great for theory and rough plans.

But IRL the actual use of the model will likely have many tiny nuances from many different situations that can add up to change the picture enough to a point where it matters. For either Quantized or straight up open model use at any common float precision datatype.

It's a huge model. And I think that most who are paying attention are curious as a spectator. Which I admit is me as well. This is exciting and interesting stuff!

From what I'm seeing so far from others since my last post, what seems the most reachable, and performant for low mem use, could be roughly about 110 to 120+ GB for quantization. That's just for disk use and when loading the quantized model into mem. (See: https://huggingface.co/Arki05/Grok-1-GGUF for example.) Likely ballooning to a bit more in mem for basic forward passes.

Might be a tight fit for a Apple CPU inference with 128 GB of ram. But still asking a lot.

Shensen1 commented 4 months ago

@MuhammadShifa It will be possible to run this on the CPU once support is added to llama.cpp and someone releases 4-bit (or lower) quantized weights. You will need around 256 GB RAM, which is a lot more reasonable for a normal user than needing this much VRAM.

The maximum amount of RAM i can squeeze into my AM5 board is 192GB of Ram at this moment. Do you think it is feasible to get it running with this?

KolinCunningham commented 2 weeks ago

So I am Wondering if you can run this through lay man terms. Give me the run down of the computer parts for min. Running standards versus amazing running standards and what parts are interchangeable if i need to build this on the fly? is 8 GPUs really $4,000 for each gpu? Is it possible to run a on a mac that is spec out to the max if not, sounds like a PC is the only answer. I need to figure out how to build the basic model of this as easiest as I can.

Is 30-core GPU mean that it has the min amount of GPUs to run or are we looking for each GPU's to have how many cores?

What is the PC equivalent?

Apple M3 Max chip

14-core CPU with 10 performance cores and four efficiency cores 30-core GPU Hardware-accelerated ray tracing 16-core Neural Engine 300GB/s memory bandwidth