Open cicirori opened 10 months ago
Wondering why the ncclint8 datatype is used in the C++ implementation of nccl_all_to_all_scatter_async, whether it's for speed reasons or simply because don't want to support multiple datatypes through templates.
Thanks!
According to bandwidth profiling, there is no speed difference between ncclInt8 x N and ncclInt32 x N / 4, so you can choose either.
ncclInt8 x N
ncclInt32 x N / 4
Wondering why the ncclint8 datatype is used in the C++ implementation of nccl_all_to_all_scatter_async, whether it's for speed reasons or simply because don't want to support multiple datatypes through templates.
Thanks!