randaller / llama-chat

Chat with Meta's LLaMA models at home made easy
GNU General Public License v3.0
833 stars 118 forks source link

"model parallel group is not initialized" when loading model #17

Closed yaoing closed 1 year ago

yaoing commented 1 year ago

Hi, I ran chat_example.py after merge weights and then got the following error when loading the model:

Loading checkpoint
Loading tokenizer
Loading model
Traceback (most recent call last):
  File "/data/yao/apps/llama/chat.py", line 118, in <module>
    fire.Fire(main)
  File "/data/yao/anaconda3/envs/chatgpt/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/data/yao/anaconda3/envs/chatgpt/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/data/yao/anaconda3/envs/chatgpt/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/data/yao/apps/llama/chat.py", line 93, in main
    generator = load(ckpt_dir, tokenizer_path, max_seq_len, max_batch_size)
  File "/data/yao/apps/llama/chat.py", line 68, in load
    model = Transformer(model_args)
  File "/data/yao/apps/llama/llama/model.py", line 205, in __init__
    self.tok_embeddings = ParallelEmbedding(
  File "/data/yao/anaconda3/envs/chatgpt/lib/python3.10/site-packages/fairscale/nn/model_parallel/layers.py", line 186, in __init__
    world_size = get_model_parallel_world_size()
  File "/data/yao/anaconda3/envs/chatgpt/lib/python3.10/site-packages/fairscale/nn/model_parallel/initialize.py", line 152, in get_model_parallel_world_size
    return torch.distributed.get_world_size(group=get_model_parallel_group())
  File "/data/yao/anaconda3/envs/chatgpt/lib/python3.10/site-packages/fairscale/nn/model_parallel/initialize.py", line 128, in get_model_parallel_group
    assert _MODEL_PARALLEL_GROUP is not None, "model parallel group is not initialized"
AssertionError: model parallel group is not initialized

Request Help!

randaller commented 1 year ago

Hi @yaoing this repo does not using fairscale. Probably you are trying to run another repo. Please make sure you did all the steps from readme, also notice this repo's [llama] folder is different with the original Meta's repo and it does not contain any ParallelEmbedding.

yaoing commented 1 year ago

It's solved, thanks!