python / cpython

The Python programming language
https://www.python.org
Other
62.36k stars 29.95k forks source link

os.cpu_count() recommends non-portable len(os.sched_getaffinity(0)) to get usable CPUs #105972

Open cquike opened 1 year ago

cquike commented 1 year ago

Documentation

The documentation of os.cpu_count() mentions that to get the the number of usable CPUs one can use len(os.sched_getaffinity(0)). However that's not available in all platforms, for instance in macOS. For Unix platforms a better approach would be to use os.sysconf('SC_NPROCESSORS_ONLN'), which is also not fully standard, however at least it has been proposed for the next POSIX standard (https://www.austingroupbugs.net/view.php?id=339) and is currently available in many UNIX platforms.

eryksun commented 1 year ago

The current implementation of os.cpu_count() uses sysconf(_SC_NPROCESSORS_ONLN) if it's available, except on Windows it prefers WinAPI GetActiveProcessorCount() if it's available, and on HP-UX it uses mpctl().

#ifdef MS_WINDOWS
#ifdef MS_WINDOWS_DESKTOP
    ncpu = GetActiveProcessorCount(ALL_PROCESSOR_GROUPS);
#endif
#elif defined(__hpux)
    ncpu = mpctl(MPC_GETNUMSPUS, NULL, NULL);
#elif defined(HAVE_SYSCONF) && defined(_SC_NPROCESSORS_ONLN)
    ncpu = sysconf(_SC_NPROCESSORS_ONLN);
#elif defined(__VXWORKS__)
    ncpu = _Py_popcount32(vxCpuEnabledGet());
#elif defined(__DragonFly__) || \
      defined(__OpenBSD__)   || \
      defined(__FreeBSD__)   || \
      defined(__NetBSD__)    || \
      defined(__APPLE__)
    int mib[2];
    size_t len = sizeof(ncpu);
    mib[0] = CTL_HW;
    mib[1] = HW_NCPU;
    if (sysctl(mib, 2, &ncpu, &len, NULL, 0) != 0)
        ncpu = 0;
#endif

The comment in the documentation about the number of usable CPUs is with regard to the CPU affinity of the current process, which can limit the process to a subset of the available processors. Thus len(os.sched_getaffinity(0)) determines the number of active/online processors that the current process can actually use.


Python's standard library has no support for Windows in regard to processor affinity, but the API functions can be called using ctypes.

Prior to Windows 11, each process is assigned to a processor group, each of which has up to 64 logical processors. Systems that have 64 or fewer logical processors only have a single processor group. For a process to use all available processors on a system that has multiple processor groups, threads have to be manually assigned to a processor group. In this case, the processor affinity can no longer be managed for the entire process via GetProcessAffinityMask() and SetProcessAffinityMask(). Instead use GetThreadGroupAffinity(), SetThreadGroupAffinity(), and SetThreadAffinityMask().

Starting with Windows 11, each process has a primary processor group, which is preferred, but by default the system is free to run a thread on any processor in any processor group in order to better manage performance and power usage. In this case, the above-mentioned API functions operate on the primary processor group of a process or thread. Affinity across all processor groups in Windows 11 is configured using "CPU set" functions such as GetProcessDefaultCpuSetMasks(), SetProcessDefaultCpuSetMasks(), GetThreadSelectedCpuSetMasks(), and SetThreadSelectedCpuSetMasks().

cquike commented 1 year ago

Thanks for the explanation. I would suggest a small change in the documentation. It actually reads:

"This number is not equivalent to the number of CPUs the current process can use. The number of usable CPUs can be obtained with len(os.sched_getaffinity(0))"

My suggestion is as follows:

"This number is not equivalent to the number of CPUs the current process can use. The number of usable CPUs by this process can be obtained with len(os.sched_getaffinity(0)) in the platforms where os.sched_getaffinity(0) is available"

dimaqq commented 3 weeks ago

Yes please :)

Let's also add platform availability to os.sched_getaffinity(...)