microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.89k stars 344 forks source link

Enable Sequence Parallelism #429

Closed polisettyvarma closed 2 months ago

polisettyvarma commented 3 months ago

@samadejacobs @tjruwase can you please review this to proceed further ?

polisettyvarma commented 2 months ago

@samadejacobs @tjruwase please review this.

polisettyvarma commented 2 months ago

@tjruwase @loadams can someone review this ?

polisettyvarma commented 2 months ago

@tjruwase Thanks for the review, please check my replies to your comments.

polisettyvarma commented 2 months ago

@tjruwase i missed your reply, sorry for the late response. please check my comment

polisettyvarma commented 2 months ago

@tjruwase please review now

polisettyvarma commented 2 months ago

@tjruwase it's approved but not merged yet, any reason ?

polisettyvarma commented 2 months ago

@tjruwase Thanks for merging. I have query regarding hpu specific changes like creating custom bash run scripts for hpu under examples_deepsped/hpu folder. is that okay ?

tjruwase commented 2 months ago

@polisettyvarma, yes that seems reasonable.

ys950902 commented 1 month ago

Hi @polisettyvarma, this pr will cause init error for rmsnorm init in torch implementation like below: [rank0]: self.input_layernorm = RMSNorm(config.hidden_size, config.layernorm_epsilon, [rank0]: TypeError: RMSNorm.init() got an unexpected keyword argument 'sequence_parallel'

I have raised the pr to fix https://github.com/microsoft/Megatron-DeepSpeed/pull/448, is it okay for you?