microsoft / torchscale

Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
3.01k stars 202 forks source link

`get_moe_group` 's return is None, when building `class MOELayer(Base)` , using one gpu #60

Closed Ruiyuan-Zhang closed 1 year ago

Ruiyuan-Zhang commented 1 year ago

Hi,

I want to replace Transformer Encoder with X-MoE Encoder. Below is my configuration:

config = EncoderConfig(
      encoder_embed_dim = 500,
      encoder_layers = 4,
      use_xmoe = True,
      moe_freq = 1,
      moe_top1_expert = True,
      moe_expert_count = 10
  )
  module = Encoder(config)

I faced the below Error:

It is because that dist.is_initialized() = None image

Thanks for your help~

shumingma commented 1 year ago

Please try using fairseq to setup the distributed environment (even with 1 GPU) for MoE stuff: https://github.com/shumingma/fairseq/blob/moe/fairseq/distributed/utils.py#L246

Ruiyuan-Zhang commented 1 year ago

Please try using fairseq to setup the distributed environment (even with 1 GPU) for MoE stuff: https://github.com/shumingma/fairseq/blob/moe/fairseq/distributed/utils.py#L246

I have apply pytorch-lightning for my distributed training. Is them conflict?

shumingma commented 1 year ago

Please try using fairseq to setup the distributed environment (even with 1 GPU) for MoE stuff: https://github.com/shumingma/fairseq/blob/moe/fairseq/distributed/utils.py#L246

I have apply pytorch-lightning for my distributed training. Is them conflict?

TBH, I'm not familiar with pytorch-ligntning. However, besides the modeling, MoE needs additional efforts on the training backend, so there may be some conflicts here.

Ruiyuan-Zhang commented 1 year ago

okk, thanks for your help~