I tried to measure the time spent in the reqs returned batch_isend_irecv(). Interestingly this time seems to be indepentent of sequence length and in total negligible. Could be that on a single node actual waits happen at a different place or the p2p transfer is so fast that compute is the bottleneck. Maybe needs further investigation.
Measured with git push --set-upstream origin measure_wait_times on 2x A5000:
I tried to measure the time spent in the
reqs
returnedbatch_isend_irecv()
. Interestingly this time seems to be indepentent of sequence length and in total negligible. Could be that on a single node actual waits happen at a different place or the p2p transfer is so fast that compute is the bottleneck. Maybe needs further investigation.Measured with
git push --set-upstream origin measure_wait_times
on 2x A5000: