Open wangyuxin87 opened 1 year ago
Do you mean Megatron and Deepspeed respectively, or working together for them all?
@ghostplant Can tutel work concurrently with Megatron or Deepspeed respectively?
Yes, Tutel is just an MoE layer implementation which is pluggable for any distributed frameworks. The way for other framework to use Tutel MoE layer is by passing distributed processing group properly, e.g.:
my_processing_group = deepspeed.new_group(..)
moe_layer = tutel_moe.moe_layer(
..,
group=my_processing_group
)
If other frameworks are not available, Tutel itself also provides a 1-line initialization to generate groups you need, which works for both distributed gpu (i.e. nccl) and distributed cpu (i.e. gloo):
from tutel import system
parallel_env = system.init_data_model_parallel(backend='nccl' if args.device == 'cuda' else 'gloo')
my_processing_group = [ parallel_env.data_group | parallel_env.model_group | parallel_env.global_group ]
...
Thanks for your prompt response!
can tutel be used with Megatron Deepspeed?