[REQUEST] Some questions about deepspeed sequence parallel

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

https://www.deepspeed.ai/

Apache License 2.0

35.38k stars 4.11k forks source link

[REQUEST] Some questions about deepspeed sequence parallel #6708

Open yingtongxiong opened 4 days ago

yingtongxiong commented 4 days ago

Hello, I want to run sequence parallel on pure deepspeed repo. However, I found that it is necessary to let developer to create sequence parallel process group, is it right? I want to know there is any solutions to use sequence parallel or MoE(which also requires expert_data_process_group and so on) on pure deepspeed.

samadejacobs commented 3 days ago

@yingtongxiong , the recommended use of Deepspeed sequence parallelism (deepspeed ulysses) is to call it from a client framework/script. Please take a look at these two examples: Megatron-DeepSpeed, HugingFace transformer

yingtongxiong commented 1 day ago

Okay Thank you very much

yingtongxiong commented 1 day ago

https://github.com/microsoft/DeepSpeedExamples/blob/uly-hf/post_training/sequence_parallelism/test_ulysses.py#L113 I see in here, the mesh_param is commented, so I think if I want to use sp, this parameters should be transmitted, is it right? @samadejacobs

yingtongxiong commented 1 day ago

Aslo, when I use sp all2all overlap, I found a little bug. https://github.com/microsoft/DeepSpeed/blob/a1b0c35a1def4bfc20fc3eeb89d6f5831fbc4ae8/deepspeed/sequence/layer.py#L242 when stream is not None, the assert is still False, so I think it should be "ctx.stream is not None".