Open yingtongxiong opened 4 days ago
@yingtongxiong , the recommended use of Deepspeed sequence parallelism (deepspeed ulysses) is to call it from a client framework/script. Please take a look at these two examples: Megatron-DeepSpeed, HugingFace transformer
Okay Thank you very much
https://github.com/microsoft/DeepSpeedExamples/blob/uly-hf/post_training/sequence_parallelism/test_ulysses.py#L113 I see in here, the mesh_param is commented, so I think if I want to use sp, this parameters should be transmitted, is it right? @samadejacobs
Aslo, when I use sp all2all overlap, I found a little bug. https://github.com/microsoft/DeepSpeed/blob/a1b0c35a1def4bfc20fc3eeb89d6f5831fbc4ae8/deepspeed/sequence/layer.py#L242 when stream is not None, the assert is still False, so I think it should be "ctx.stream is not None".
Hello, I want to run sequence parallel on pure deepspeed repo. However, I found that it is necessary to let developer to create sequence parallel process group, is it right? I want to know there is any solutions to use sequence parallel or MoE(which also requires expert_data_process_group and so on) on pure deepspeed.