intel / ipmctl

BSD 3-Clause "New" or "Revised" License
183 stars 62 forks source link

Region created on socket #0 reportedly on numa_node #1 #200

Closed tanabarr closed 1 year ago

tanabarr commented 1 year ago

After creating PMem regions with ipmctl, the region numa_node reported by ndctl doesn't match the socket ID reported by ipmctl (ISetID for ipmctl with RegionID 0x0001, SocketID 0x0000 matches ndctl with dev region0, numa_node 1).

$ sudo ipmctl show -o nvmxml -region                                          <?xml version="1.0"?>
 <RegionList>
  <Region>
   <SocketID>0x0000</SocketID>
   <PersistentMemoryType>AppDirect</PersistentMemoryType>
   <Capacity>1008.000 GiB</Capacity>
   <FreeCapacity>1008.000 GiB</FreeCapacity>
   <HealthState>Healthy</HealthState>
   <DimmID>0x0001, 0x0011, 0x0101, 0x0111, 0x0201, 0x0211, 0x0301, 0x0311</DimmID>
   <RegionID>0x0001</RegionID>
   <ISetID>0x04a32120b4fe1110</ISetID>
  </Region>
  <Region>
   <SocketID>0x0001</SocketID>
   <PersistentMemoryType>AppDirect</PersistentMemoryType>
   <Capacity>1008.000 GiB</Capacity>
   <FreeCapacity>1008.000 GiB</FreeCapacity>
   <HealthState>Healthy</HealthState>
   <DimmID>0x1001, 0x1011, 0x1101, 0x1111, 0x1201, 0x1211, 0x1301, 0x1311</DimmID>
   <RegionID>0x0002</RegionID>
   <ISetID>0x3a7b2120bb081110</ISetID>
  </Region>
 </RegionList>
$ sudo ndctl list -Rv
[
  {
    "dev":"region1",
    "size":1082331758592,
    "align":16777216,
    "available_size":1082331758592,
    "max_available_extent":1082331758592,
    "type":"pmem",
    "numa_node":0,
    "target_node":3,
    "iset_id":4213998300795769104,
    "persistence_domain":"memory_controller"
  },
  {
    "dev":"region0",
    "size":1082331758592,
    "align":16777216,
    "available_size":1082331758592,
    "max_available_extent":1082331758592,
    "type":"pmem",
    "numa_node":1,
    "target_node":2,
    "iset_id":334147221714768144,
    "persistence_domain":"memory_controller"
  }
]

numa_node of created block device doesn't match the socket ID reported by ipmctl:

[tanabarr@wolf-226 daos]$ cat /sys/class/block/pmem1/device/numa_node
0
[tanabarr@wolf-226 daos]$ sudo ipmctl show -region
 SocketID | ISetID             | PersistentMemoryType | Capacity     | FreeCapacity | HealthState
==================================================================================================
 0x0000   | 0x04a32120b4fe1110 | AppDirect            | 1008.000 GiB | 1008.000 GiB | Pending
 0x0001   | 0x3a7b2120bb081110 | AppDirect            | 1008.000 GiB | 0.000 GiB    | Healthy
[tanabarr@wolf-226 daos]$ ls -lah /dev/pmem*
brw-rw---- 1 root disk 259, 16 Mar  9 15:43 /dev/pmem1
brw-rw---- 1 root disk 259, 17 Mar  9 15:43 /dev/pmem1.1
brw-rw---- 1 root disk 259, 18 Mar  9 15:43 /dev/pmem1.2
brw-rw---- 1 root disk 259, 19 Mar  9 15:44 /dev/pmem1.3

More detail:

[tanabarr@wolf-226 daos]$ ls -lah /dev/pmem*
brw-rw---- 1 root disk 259, 20 Mar  9 16:03 /dev/pmem0
brw-rw---- 1 root disk 259, 21 Mar  9 16:04 /dev/pmem0.1
brw-rw---- 1 root disk 259, 22 Mar  9 16:04 /dev/pmem0.2
brw-rw---- 1 root disk 259, 23 Mar  9 16:05 /dev/pmem0.3
brw-rw---- 1 root disk 259, 16 Mar  9 15:43 /dev/pmem1
brw-rw---- 1 root disk 259, 17 Mar  9 15:43 /dev/pmem1.1
brw-rw---- 1 root disk 259, 18 Mar  9 15:43 /dev/pmem1.2
brw-rw---- 1 root disk 259, 19 Mar  9 15:44 /dev/pmem1.3
[tanabarr@wolf-226 daos]$ cat /sys/class/block/pmem1*/device/numa_node
0
0
0
0
[tanabarr@wolf-226 daos]$ cat /sys/class/block/pmem0*/device/numa_node
1
1
1
1
[tanabarr@wolf-226 daos]$ sudo ipmctl show -region
 SocketID | ISetID             | PersistentMemoryType | Capacity     | FreeCapacity | HealthState
==================================================================================================
 0x0000   | 0x04a32120b4fe1110 | AppDirect            | 1008.000 GiB | 0.000 GiB    | Healthy
 0x0001   | 0x3a7b2120bb081110 | AppDirect            | 1008.000 GiB | 0.000 GiB    | Healthy

SocketID should be equal to NUMA node ID of region, uniquely identified by ISetID.

[tanabarr@wolf-226 daos]$ sudo ndctl list -R
[
  {
    "dev":"region1",
    "size":1082331758592,
    "align":16777216,
    "available_size":0,
    "max_available_extent":0,
    "type":"pmem",
    "iset_id":4213998300795769104,
    "persistence_domain":"memory_controller"
  },
  {
    "dev":"region0",
    "size":1082331758592,
    "align":16777216,
    "available_size":0,
    "max_available_extent":0,
    "type":"pmem",
    "iset_id":334147221714768144,
    "persistence_domain":"memory_controller"
  }
]

region0 iset_id (334147221714768144 == 0x4A32120B4FE1110) matches with ipmctl region on socket 0 region1 iset_id (4213998300795769104 == 0x3A7B2120BB081110) matches with ipmctl region on socket 1

This doesn't correlate as namespaces on region0 are reportedly on numa_node 1:

[tanabarr@wolf-226 daos]$ sudo ndctl list -Rv -r 0
{
  "regions":[
    {
      "dev":"region0",
      "size":1082331758592,
      "align":16777216,
      "available_size":0,
      "max_available_extent":0,
      "type":"pmem",
      "numa_node":1,
      "target_node":2,
      "iset_id":334147221714768144,
      "persistence_domain":"memory_controller",
      "namespaces":[
        {
          "dev":"namespace0.2",
          "mode":"fsdax",
          "map":"dev",
          "size":266352984064,
          "uuid":"8676b101-3035-4e07-9ccc-a1a4dcab915a",
          "raw_uuid":"987ac22c-20e8-41eb-90d3-1a9d9e4bd0a5",
          "sector_size":512,
          "align":2097152,
          "blockdev":"pmem0.2",
          "numa_node":1,
          "target_node":2
        },
        ...

[tanabarr@wolf-226 daos]$ hwloc-ls
Machine (251GB total)
  Package L#0
    NUMANode L#0 (P#0 125GB)
...
    Block(NVDIMM) "pmem1.2"
    Block(NVDIMM) "pmem1"
    Block(NVDIMM) "pmem1.3"
    Block(NVDIMM) "pmem1.1"
  Package L#1
    NUMANode L#1 (P#1 126GB)
...
    Block(NVDIMM) "pmem0.1"
    Block(NVDIMM) "pmem0.2"
    Block(NVDIMM) "pmem0.3"
    Block(NVDIMM) "pmem0"

I am confused as to why the numa_node doesn't match the socket ID, can someone help me understand please?

OS: Rocky Linux 8.6

Kernel: $ uname -a Linux 4.18.0-372.32.1.el8_6.x86_64 #1 SMP Thu Oct 27 15:18:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Optane + IceLake platform cpu family : 6 model : 106 model name : Intel(R) Xeon(R) Gold 5320 CPU @ 2.20GHz stepping : 6 microcode : 0xd000389

StevenPontsler commented 1 year ago

It looks like discussion is happening on the ndctl site at https://github.com/pmem/ndctl/issues/235

tanabarr commented 1 year ago

identified as a platform bug, see https://github.com/pmem/ndctl/issues/235