[REQUEST]I do not understand the meaning of ' reduction ' in the ZERO++ paper.

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

https://www.deepspeed.ai/

Apache License 2.0

34.64k stars 4.04k forks source link

[REQUEST]I do not understand the meaning of ' reduction ' in the ZERO++ paper. #5440

Closed laladong closed 3 weeks ago

laladong commented 4 months ago

in 3.3.1 All-to-all based implementation.
' Once a GPU receives gradients from its predecessor, we dequantize it to recover full precision and conduct a local reduction. ' I would like to know what is the specific operation of reduction?

loadams commented 4 months ago

@laladong - this is an all reduce, what more/specifically were you looking for?

loadams commented 3 weeks ago

Closing as stale for now, please re-open if you have more questions.