Run 'inference.py' and 'model parallel group is not initialized'

ildartregulov commented 1 year ago

~/GPT/pyllama_data/pyllama$ python inference.py --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH
Traceback (most recent call last):
  File "/home/ildar/GPT/pyllama_data/pyllama/inference.py", line 82, in <module>
    run(
  File "/home/ildar/GPT/pyllama_data/pyllama/inference.py", line 50, in run
    generator = load(
  File "/home/ildar/GPT/pyllama_data/pyllama/inference.py", line 33, in load
    model = Transformer(model_args)
  File "/home/ildar/GPT/pyllama_data/pyllama/llama/model_parallel.py", line 217, in __init__
    self.tok_embeddings = ParallelEmbedding(
  File "/home/ildar/.local/lib/python3.10/site-packages/fairscale/nn/model_parallel/layers.py", line 186, in __init__
    world_size = get_model_parallel_world_size()
  File "/home/ildar/.local/lib/python3.10/site-packages/fairscale/nn/model_parallel/initialize.py", line 152, in get_model_parallel_world_size
    return torch.distributed.get_world_size(group=get_model_parallel_group())
  File "/home/ildar/.local/lib/python3.10/site-packages/fairscale/nn/model_parallel/initialize.py", line 128, in get_model_parallel_group
    assert _MODEL_PARALLEL_GROUP is not None, "model parallel group is not initialized"
AssertionError: model parallel group is not initialized

I use 2 nvidia 1080ti and try to start 7B model

wangshuaiwu commented 1 year ago

Me too，I was able to run it before, and today I took it out and ran it again, and there was this problem

JunLiangZ commented 1 year ago

Have you solved it? I have the same problem

wangshuaiwu commented 1 year ago

My solution was to compare it with the official code and make a change. Here is the official link https://github.com/facebookresearch/llama

JunLiangZ commented 1 year ago

好的感谢您的解答。您跑出来的LLama模型效果怎么样呢？我跑出来的llama-7B的模型回答好奇怪啊，是不是都是这样的呢？

------------------ 原始邮件 ------------------ 发件人: "juncongmoo/pyllama" @.>; 发送时间: 2023年5月15日(星期一) 下午3:21 @.>; @.**@.>; 主题: Re: [juncongmoo/pyllama] Run 'inference.py' and 'model parallel group is not initialized' (Issue #86)

我当时的解决方案是：和官方代码对比，做出修改，这是官方的链接 https://github.com/facebookresearch/llama

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

wangshuaiwu commented 1 year ago

The 7B model came out and answered really strange, you can try to run the larger model

JunLiangZ commented 1 year ago

Have you tried models over 7B? How is the effect?

------------------ 原始邮件 ------------------ 发件人: "juncongmoo/pyllama" @.>; 发送时间: 2023年5月16日(星期二) 上午10:28 @.>; @.**@.>; 主题: Re: [juncongmoo/pyllama] Run 'inference.py' and 'model parallel group is not initialized' (Issue #86)

The 7B model came out and answered really strange, you can try to run the larger model

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

tbaggu commented 6 months ago

check for this env PYLLAMA_META_MP if its not set then it should work without model parallel

juncongmoo / pyllama

Run 'inference.py' and 'model parallel group is not initialized' #86