Open stas00 opened 8 months ago
Yes, I am experiencing the same issue.
Hi @HeyangQin , I believe we don't support bf16 training, which cause the RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half
Hi @GuanhuaWang, thanks for your explanation, for bf16 training on zero++ not supporting, is it because for quantize kernel on zero_quantized_weights only supports fp16/fp32, or some further accuracy issue with bf16 datatype?
The same issue.
Same, llama doesn't like continuing training in fp16 so would be great to have native support for bf16. Also should we expect to be able to run FP8 training w/ zero++ @GuanhuaWang?
Describe the bug
Adding
"zero_quantized_weights": true,
leads to a crash:config:
ds_report output
Screenshots If applicable, add screenshots to help explain your problem.
System info (please complete the following information):
@HeyangQin