jaypipes / ghw

Go HardWare discovery/inspection library
Apache License 2.0
1.62k stars 174 forks source link

Retrieval of core_id and physical_package_id for CPUs that are offline. #361

Closed kishen-v closed 4 months ago

kishen-v commented 5 months ago

It was observed that, on CPUs that are made offline using echo 0 > /sys/devices/system/cpu/cpuX/online, the topology directory does not contain information related to core_id and physical_package_id.

Due to the missing file, a related message WARNING: failed to read int from file: open /sys/devices/system/cpu/cpuX/topology/physical_package_id: no such file or directory is raised, and a similar behaviour is also observed if the core_id file is missing too.

In the case of a large compute node, this may lead to plenty of warnings printed along with other information if many CPUs are offline.

One such instance which was observed on the system is

WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu10/topology/physical_package_id: no such file or directory
WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu10/topology/core_id: no such file or directory
WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu11/topology/physical_package_id: no such file or directory
WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu11/topology/core_id: no such file or directory
WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu12/topology/physical_package_id: no such file or directory
WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu12/topology/core_id: no such file or directory
WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu13/topology/physical_package_id: no such file or directory
WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu13/topology/core_id: no such file or directory
WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu14/topology/physical_package_id: no such file or directory
WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu14/topology/core_id: no such file or directory
WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu15/topology/physical_package_id: no such file or directory
WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu15/topology/core_id: no such file or directory
WARNING: failed to read int from file: open /sys/devices/system/cpu/cpu16/topology/physical_package_id: no such file or directory

... - From cpu8 to cpu64.

This can be overcome by checking the online file in the cpuX directory for the bit 1 before proceeding to access the physical_package_id/core_id file.