Closed delock closed 2 days ago
Could you consider adding some unit tests to perhaps test_dist.py
to test support for the different data types?
Could you consider adding some unit tests to perhaps
test_dist.py
to test support for the different data types?
Let me see if I can add some tests.
Hi @adk9, TestDistInferenceAllReduce
is modified to test fp32, bf16 and fp16. Can you help start the workflow? Thanks!
Hi @adk9 the failure for FP32 allreduce is due to modified UT test world_size=1, 2, or 4
and there was an unnecessary assertion for world_size == 1. Now this assertion had been removed given the code can handle this situation well. Can you help restart the workflow? Thanks!
This PR adds FP16 support to DeepSpeed SHM inference_all_reduce. Previously only FP32 and BF16 is supported. This is to align with PyTorch CPU support on FP16 datatype.