Closed polisettyvarma closed 2 months ago
@samadejacobs @tjruwase please review this.
@tjruwase @loadams can someone review this ?
@tjruwase Thanks for the review, please check my replies to your comments.
@tjruwase i missed your reply, sorry for the late response. please check my comment
@tjruwase please review now
@tjruwase it's approved but not merged yet, any reason ?
@tjruwase Thanks for merging. I have query regarding hpu specific changes like creating custom bash run scripts for hpu under examples_deepsped/hpu folder. is that okay ?
@polisettyvarma, yes that seems reasonable.
Hi @polisettyvarma, this pr will cause init error for rmsnorm init in torch implementation like below: [rank0]: self.input_layernorm = RMSNorm(config.hidden_size, config.layernorm_epsilon, [rank0]: TypeError: RMSNorm.init() got an unexpected keyword argument 'sequence_parallel'
I have raised the pr to fix https://github.com/microsoft/Megatron-DeepSpeed/pull/448, is it okay for you?
@samadejacobs @tjruwase can you please review this to proceed further ?