nebuly-ai / optimate

A collection of libraries to optimise AI model performances
https://www.nebuly.com/
Apache License 2.0
8.37k stars 639 forks source link

[Chatllama]: How to train llama-7B with multiple GPU? #252

Open bnuzhanyu opened 1 year ago

bnuzhanyu commented 1 year ago

I downloaded the llama-7B model which MP=1. I modified the config: actor_config: device: "cuda:1,2,5,7" model: "llama-7B"

I tried: torchrun --nproc_per_node=4 artifacts/main.py artifacts/config/config.yaml --type ACTOR and get: AssertionError: Loading a checkpoint for MP=1 but world size is 4.

I tried: python artifacts/main.py artifacts/config/config.yaml It seems it use cuda:0, and out of cuda memory.

So, I have two questions:

  1. how can I train models with multiple GPU?
  2. Is llama-7B can be trained on multiple GPU?
bimalm commented 1 year ago

https://github.com/facebookresearch/llama/issues/55

PierpaoloSorbellini commented 1 year ago

Hi @bnuzhanyu @bimalm Vanilla LLaMA it is only for inference. We have reimplemented it to make it suitable for training. We are working on stabilising the distributed training, we will keep you updated when a new stable release is available.