open-mpi / hwloc

Hardware locality (hwloc)
https://www.open-mpi.org/projects/hwloc
Other
565 stars 173 forks source link

Detect new attributes for hard disks, DIMMs, HCAs, etc #116

Closed bgoglin closed 9 years ago

bgoglin commented 9 years ago

Intel would like to get many new attributes. Summary of discussion and comments gathered from http://www.open-mpi.org/community/lists/hwloc-devel/2014/09/4226.php

It's OK if some of them are only available as root.

  1. Memory
    1. Total memory
    2. Total DIMMS
    3. Individual DIMM's:
      1. Serial numbers
      2. Vendor Name
      3. Model
      4. Memory Frequency
    4. Notes:
      • For Memory, I would suggest have a single object for each node and list the DIMM details as attributes for that object.
      • Most of the DIMM related data is provided by the SMBIOS tables. 'dmidecode' provides a lot of this information.
      • I'm not sure you can map a specific DIMM to a specific address within a NUMA region. However, we can at least add the DIMMs to the root-object attributes. In addition, you can certainly map a DIMM to a specific DIMM socket, and I believe that means you can map it to a given NUMA region even if you can't say where it is within that region. Have to verify that.
      • (Intel Haswell) lshw only report DIMM info when run as root, which I suspect would point them to accessing DMI information via /dev/mem.
      • Memory info is available from lshw, though they are a GPL code:
   *-bank:0
          description: DIMM Synchronous 1333 MHz (0.8 ns)
          product: M393B1K70DH0-YH9
          vendor: 0x80CE
          physical id: 0
          serial: 0x85B5FED3
          slot: DIMM_A1
          size: 8GiB
          width: 64 bits
          clock: 1333MHz (0.8ns)
  1. Network Adapters (Ethernet)
    1. Model
    2. Speed (Both supported and currently negociated link speed => ethtool)
    3. Serial Number (if applicable)
    4. MAC address
  2. Network Adapters (Infiniband)
    1. Model
    2. Speed
    3. Serial Number (if applicable)
    4. MAC address
  3. Host Bus Adapters
    1. Manufacturer
    2. Serial Number
    3. MAC address
    4. Notes:
      • HBAs are also more like non-infiniband network interfaces (which are called HCAs usually), which include fiber optic, eSATA, etc. But work done should be similar to the previous section for network interfaces.
  4. Coprocessors
    1. Manufacturer
    2. Serial Number
  5. Other PCI Devices
    1. Device ID
    2. Device Serial number (if applicable)
    3. Notes
      • The serial number isn't standardized anywhere in the PCI config space, this item is likely impossible.
  6. HardDrive
    1. Model, Form factor, etc.
    2. Vendor
    3. Serial Number
    4. Size
    5. Notes
      • For hard drives, we can have similar objects for each SATA0.., etc node, whose lanes are usually connected via the PCH to a single socket. Each hard drive can have its own object, and all the attributes of the hard drive can be stored within that object.
      • udev gathers this information:
# ll /sys/block/sda/bdi
lrwxrwxrwx. 1 root root 0 Sep 23 09:33 /sys/block/sda/bdi ->
../../../../../../../../virtual/bdi/8:0
# grep SERIAL '/run/udev/data/b8:0'
E:ID_SERIAL=SAMSUNG_MZ7TD256HAFV-000L9_S17LNSADC13325
E:ID_SERIAL_SHORT=S17LNSADC13325
jsquyres commented 9 years ago

Adding @rhc54

vpedabal commented 9 years ago

All the items requested are added into master and the v1.11 branch now.