Hi, I was reading the GaLore paper and noticed that the "ground truth" baseline seems to be pure BF16 training with nearest rounding. It is generally accepted that pure BF16 training with nearest rounding does not converge to the same point as FP32 or BF16/FP32 mixed precision training -- does GaLore only match pure BF16 or does it match FP32 training as well?
Hi, I was reading the GaLore paper and noticed that the "ground truth" baseline seems to be pure BF16 training with nearest rounding. It is generally accepted that pure BF16 training with nearest rounding does not converge to the same point as FP32 or BF16/FP32 mixed precision training -- does GaLore only match pure BF16 or does it match FP32 training as well?