microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
34.8k stars 4.05k forks source link

Loading fp16 model checkpoints with MoE layers #1876

Open joeljang opened 2 years ago

joeljang commented 2 years ago

I replaced one of the layers of gpt2 model with a moe layer training with deepspeed_stage_2. However when trying to run convert_to_fp32.py, I run into all sorts of errors. Does the library currently support converting MoE layers trained with deepspeed into fp32 ?

tjruwase commented 2 years ago

@joeljang, thanks for reporting this issue. No, the library does not currently support MoE layers. We have not previously tested that usage.

joeljang commented 2 years ago

Do you have future plans to provide this feature?

tjruwase commented 2 years ago

@joeljang, yes we plan to provide support for converting checkpoints with MoE layers using convert_to_fp32.py. In the meantime, can you please share stack trace of the errors you encountered? Thanks!

joeljang commented 2 years ago
Screen Shot 2022-04-03 at 1 32 19 AM