[AnyPrecision optimizer] consider FP32 defaults, possibly automated via BF16 support check

pytorch / torchdistx

Torch Distributed Experimental

BSD 3-Clause "New" or "Revised" License

116 stars 31 forks source link

Enhancement (credit to @rohan-varma): "this can be done in a follow up PR, but let's maybe consider not defaulting things to torch.bfloat16 eventually. this is because it might be good to make this optimizer usable out of the box with the defaults on all HW architectures, but only A100 supports bfloat16 well at the moment.

But the downside here would be that the default optimizer won't be too interesting, it'd just be AdamW"

Possible option to accomplish this would be a simple bf16 native support check, and then revert any BF16 defaults to FP32 (and turn off Kahan as well since it would not add benefit). Downside dilemma is if you should warn user about this change - positive they know they are not getting BF16 benefits, negative is they may have been aware and don't enjoy one line warning * 128 gpus.

pytorch / torchdistx

[AnyPrecision optimizer] consider FP32 defaults, possibly automated via BF16 support check #59