open-mpi / hwloc

Hardware locality (hwloc)
https://www.open-mpi.org/projects/hwloc
Other
577 stars 174 forks source link

hwloc_get_proc_cpubind not associated on windows sever 2022 with two processors groups #697

Open VincentDarrigrand opened 4 hours ago

VincentDarrigrand commented 4 hours ago

Hello,

I am facing an issue on a call to hwloc_get_proc_cpubind on a windows sever that has two processor groups.

Indeed, the function topology->binding_hooks.get_proc_cpubind is a null ptr since it does get associated during the call of hwloc_set_windows_hooks.

Indeed we can see that if nr_processor_groups is not 1, the hook is not set: https://github.com/open-mpi/hwloc/blob/0474e06f6cc7c9020517fee223b1a73c1e6b2af4/hwloc/topology-windows.c#L1330C3-L1339C4

if (nr_processor_groups == 1) {
    hooks->set_proc_cpubind = hwloc_win_set_proc_cpubind;
    hooks->get_proc_cpubind = hwloc_win_get_proc_cpubind;
    hooks->set_thisproc_cpubind = hwloc_win_set_thisproc_cpubind;
    hooks->get_thisproc_cpubind = hwloc_win_get_thisproc_cpubind;
    hooks->set_proc_membind = hwloc_win_set_proc_membind;
    hooks->get_proc_membind = hwloc_win_get_proc_membind;
    hooks->set_thisproc_membind = hwloc_win_set_thisproc_membind;
    hooks->get_thisproc_membind = hwloc_win_get_thisproc_membind;
  }

It came as a surprise since the rest of this function seems to be equipped for such cases (as mentioned in the release notes on version 3.7.0, the support for such machines has been implemented)

Is this fixable? Or is it due to some hardware/software limitations?

Thanks in advance, Best regards.

Vincent Darrigrand

bgoglin commented 4 hours ago

Hello Process binding on Windows is limited by its own API. It was designed for machines with single processor groups and never extended (while the thread binding API was extended). Basically you only have access to the first processor group. There are some other APIs (job objects and cpusets iirc) but each of these also has different limitations, especially when used inside an intermediate library like hwloc which doesn't know if another software piece used one of these APIs earlier. I asked Microsoft multiple times about this but I don't expect any good solution anymore. IIRC hwloc 2.7.0 improvements on Windows were on the topology discovery side (being able to query objects that are larger than a single processor group).