Open rgioiosa78 opened 9 months ago
This might also occur when standing up new RMA regions
While likely not the complete solution we want for version 0.3, I changed the serial handshaking that was going on to instead use a libPMI based exchange.
This should help a bit during the init process, and during creating of new RMA regions.
If using with Lamellar, the impact may be minimal for versions <=0.5 as I found another portion of init process that was also inefficient after changing Rofi, my recommendation is to use lamellar >=0.6
When initializing the RDMA data structure and process/NIC table at scale (256+ nodes) the time is about 80s. This process is mostly serialized at node 0 and should be re-worked with better collectives.