Open eric-haibin-lin opened 4 years ago
@eric-haibin-lin Yes, you are correct, DeepSpeed's fp16 and ZeRO optimizers are using mixed precision. We currently don't have a fully half-precision optimizer.
thank you for the reply! Is there a plan to add/evaluate fully fp16 optimizer?
We don't currently have a plan for this. Yours is the first interest in this, and so we will note it. Thanks!
Hi,
Is there an implementation of fully half-precision optimizer? I see from the GPT-3 paper:
It looks like DeepSpeed optimizer is still using mixed precision. Has anyone looked into this before? This would allow training these big models with much less memory ... Thanks!