[CPU] add fp16 support to shm inference_all_reduce

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

https://www.deepspeed.ai/

Apache License 2.0

33.6k stars 3.94k forks source link

[CPU] add fp16 support to shm inference_all_reduce #5669

Closed delock closed 2 days ago

delock commented 1 week ago

This PR adds FP16 support to DeepSpeed SHM inference_all_reduce. Previously only FP32 and BF16 is supported. This is to align with PyTorch CPU support on FP16 datatype.

adk9 commented 1 week ago

Could you consider adding some unit tests to perhaps test_dist.py to test support for the different data types?

delock commented 1 week ago

Could you consider adding some unit tests to perhaps test_dist.py to test support for the different data types?

Let me see if I can add some tests.

delock commented 1 week ago

Hi @adk9, TestDistInferenceAllReduce is modified to test fp32, bf16 and fp16. Can you help start the workflow? Thanks!

delock commented 1 week ago

Hi @adk9 the failure for FP32 allreduce is due to modified UT test world_size=1, 2, or 4 and there was an unnecessary assertion for world_size == 1. Now this assertion had been removed given the code can handle this situation well. Can you help restart the workflow? Thanks!