parasailteam / coconet

MIT License
73 stars 11 forks source link

Doubts about experimental performance data of single operator overlap #8

Closed lixiaolx closed 1 year ago

lixiaolx commented 1 year ago

image

I would like to ask whether the 1.36X comparison base in the paper is obtained from the same nccl-channel test?

Or select the optimal value of different channels of cubals+allreduce, and compare the optimal value of different channels of overlap to get 1.36X

abhijangda commented 1 year ago

Both CoCoNet and NCCL times are obtained by using the values of NCCL_BUFFSIZE and CHANNELS that performs best. The tile size for cutlass were same. Although it is possible that different tile size for cutlass might have made CoCoNet perform better.