Closed runqch closed 3 years ago
The way it works is: hwloc reads NUMA node size from sysfs:
$ grep MemTotal /sys/devices/system/node/node*/meminfo
Node 0 MemTotal: 15753556 kB
Then it changes it in bytes by multiplying by 1024. Then it accumulates all NUMA nodes memory into the "total memory" of the machine in bytes. When lstopo prints it, it divides by 1024^3 for GB. So first thing to do here is to compare sysfs NUMA node sizes with 359GB and 356GB.
The output of hwloc-ls is almost consistent with the result of MemTotal.
#grep MemTotal /sys/devices/system/node/node*/meminfo
/sys/devices/system/node/node0/meminfo:Node 0 MemTotal: 375262204 kB
/sys/devices/system/node/node1/meminfo:Node 1 MemTotal: 375638744 kB
hwloc-ls node*/meminfo
Node 0: 359G 357G
Node 1: 356G 357G
The gap is about 50G (=768-715) between physical memory and the output.
It seems that the question is actually why /proc/meminfo says MemTotal=790747444 instead of 375262204+375638744 = 750900948. I don't see any difference on my machines. I know all these values can slightly change after a reboot, but not by 50GB.
The other small differences (357 vs 359G) are likely related to dividing by 1024 instead of 1000, or by kB meaning kiB or kB depending on where we're looking.
This issue occurs for redhat 8.3 machines. For redhat 7.* , no such problem. Do you have any clue what would leads to such big gap? Do we ever verified hwloc 1.11.8 on redhat 8.3 machines ? @bgoglin
From what I see in some discussion about the Linux kernel, some pages are "reserved" by the kernel for some "init" data. Those are removed from NUMA node available memory (because that's where memory accounting really occurs) but not from the total machine memory (because this amount isn't very useful in the kernel code from what I understand). Maybe things changed in the kernel between RHEL7 and 8 but I don't think it matters much anyway. The entire machine memory isn't available anyway since the kernel allocates its own things. These MemTotal fields are only a very vague indicator of what applications may allocate if they are alone on the machine.
hwloc 1.11.8 does nothing special about this. This code has been the same trivial code explained above from 1.0 (10 years ago) up to latest 2.5, it's not going to report anything different now unless the kernel changes.
I am closing this issue since there's no bug in hwloc but only a strange kernel behavior, but we can continue discussing if you wish. https://toroid.org/linux-physical-memory seems to discuss related things. It looks like info about memory disappearing like is in the early kernel boot log. And the the entire /proc/meminfo et entire /sys/devices/system/node/node*/memory may help too.
Thanks for providing so many helps on this issue. Really appreciate.
What version of hwloc are you using?
1.11.8
Which operating system and hardware are you running on?
Red Hat Enterprise Linux release 8.3 (Ootpa)
Details of the problem
Total memory reported by hwloctopology* API call is lower than actual total memory on Redhat 8.3
All nodes are Skylake 2x24 core cpus with 768GB of total memory, refer /proc/meminfo and top is as below:
But from hwloctopology* api (or by hwloc-ls), we only got 716G, which is lower than actual memory:
Additional information