intel / torch-ccl

oneCCL Bindings for Pytorch*
BSD 3-Clause "New" or "Revised" License
86 stars 25 forks source link

Communication and compute on separate Streams do not overlap #64

Open garrett361 opened 6 months ago

garrett361 commented 6 months ago

Cross-posting this issue from ipex, in case the torch-ccl team is not aware of it.

Key issues:

The pytorch profiler traces highlight the issues (copied from the other thread):

A100 Trace

nvidia_a100_trace

Non-blocking kernel launch and comms/compute overlap.

Intel Max 1550 Trace

intel_1550_trace

Blocking kernel launch and no comms/compute overlap.

See the other thread for more details.