[cifar ds training]: Set cuda device during initialization of distributed backend.

microsoft / DeepSpeedExamples

Example models using DeepSpeed

Apache License 2.0

6.1k stars 1.04k forks source link

[cifar ds training]: Set cuda device during initialization of distributed backend. #931

Closed jagadish-amd closed 3 weeks ago

jagadish-amd commented 1 month ago

The commit is needed to avoid GPU 0 being set as the compute stream via torch.cuda.current_stream() during initialization across all GPUs. The perf RunningAvgSamplesPerSec metrics improves on a multi gpu node, tested on AMD GPU with ROCm stack. As number of GPUs increases; without this commit, GPU 0 takes in more load compared to other GPUs.

jagadish-amd commented 1 month ago

ping @jeffdaily

jagadish-amd commented 1 month ago

@tjruwase can you please review / merge ?

tjruwase commented 3 weeks ago

@tjruwase can you please review / merge ?

@jagadish-amd, apologies for the delay. Done.