NCCL Fast Socket is based on TCP/IP communication and uses a number of techniques to achieve better and more consistent performance, especially with 100 Gbps networking on Google Cloud
does this applied to a self-hosted clusters with 100Gbps Nics?
what's the key actions that fastsocket do to improve NCCL collective communication performance?
as README stated