pnnl / rofi

Other
10 stars 2 forks source link

Initialization overhead at scale #4

Open rgioiosa78 opened 9 months ago

rgioiosa78 commented 9 months ago

When initializing the RDMA data structure and process/NIC table at scale (256+ nodes) the time is about 80s. This process is mostly serialized at node 0 and should be re-worked with better collectives.

rgioiosa78 commented 9 months ago

This might also occur when standing up new RMA regions

rdfriese commented 9 months ago

While likely not the complete solution we want for version 0.3, I changed the serial handshaking that was going on to instead use a libPMI based exchange.

This should help a bit during the init process, and during creating of new RMA regions.

If using with Lamellar, the impact may be minimal for versions <=0.5 as I found another portion of init process that was also inefficient after changing Rofi, my recommendation is to use lamellar >=0.6