Fully half precision optimizer

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

https://www.deepspeed.ai/

Apache License 2.0

35.48k stars 4.12k forks source link

Fully half precision optimizer #447

Open eric-haibin-lin opened 4 years ago

eric-haibin-lin commented 4 years ago

Hi,

Is there an implementation of fully half-precision optimizer? I see from the GPT-3 paper:

... implemented an early version of the codebase, and developed the memory optimizations for fully half-precision training.

It looks like DeepSpeed optimizer is still using mixed precision. Has anyone looked into this before? This would allow training these big models with much less memory ... Thanks!

tjruwase commented 4 years ago

@eric-haibin-lin Yes, you are correct, DeepSpeed's fp16 and ZeRO optimizers are using mixed precision. We currently don't have a fully half-precision optimizer.

eric-haibin-lin commented 4 years ago

thank you for the reply! Is there a plan to add/evaluate fully fp16 optimizer?

tjruwase commented 4 years ago

We don't currently have a plan for this. Yours is the first interest in this, and so we will note it. Thanks!