pmem / ndctl

A "device memory" enabling project encompassing tools and libraries for CXL, NVDIMMs, DAX, memory tiering and other platform memory device topics.
Other
271 stars 139 forks source link

[check-labels] should ignore 'disabled' nmems #65

Open sscargal opened 6 years ago

sscargal commented 6 years ago

Config

OS: Fedora 28 Kernel: 4.17.9-200.fc28.x86_64 ndctl version: 62 nmems: 3 x enabled, 1 x disabled

Issue

The check-labels command doesn't work when given a disabled nmem.

This particular system has 3 active/enabled and 1 'non-functional' (disabled) NVDIMMs. Whenever check-labels attempts to scan nmem1, we always get a successfully verified 0 nmem response which doesn't tell the user why the operation failed. It also fails when presented with one disabled and one or more active/enabled nmems which is not what a user would expect.

# ndctl list -iD
[
  {
    "dev":"nmem1",
    "id":"8089-a1-1811-00000074",
    "handle":257,
    "phys_id":0,
    "state":"disabled",  <<<<<<<<<<<<<<
    "flag_failed_map":true
  },
  {
    "dev":"nmem3",
    "id":"8089-a1-1811-0000005d",
    "handle":4353,
    "phys_id":68
  },
  {
    "dev":"nmem0",
    "id":"8089-a1-1811-00000058",
    "handle":1,
    "phys_id":32
  },
  {
    "dev":"nmem2",
    "id":"8089-a1-1811-00000068",
    "handle":4097,
    "phys_id":56
  }
]

If we try to check the labels on nmem1, we get

# ndctl check-labels nmem1
successfully verified 0 nmem

While the message is not entirely helpful, the response is expected. The message could be improved to indicate that the nmem is disabled. This would be very helpful in debugging situations.

Given nmem1 is disabled I would expect the check-label to ignore it and skip to the next one(s), but it doesn't. If we give check-labels a list of enabled and disabled nmems, it always silently fails

# ndctl check-labels nmem1
successfully verified 0 nmem

# ndctl check-labels nmem0 nmem1
successfully verified 0 nmem

# ndctl check-labels nmem0 nmem1 nmem2
successfully verified 0 nmem

# ndctl check-labels nmem0 nmem1 nmem2 nmem3
successfully verified 0 nmem

Expected Output

I would expect the following response (or something similar)

# ndctl check-labels nmem1
nmem1 is disabled.  skipping checks.

# ndctl check-labels nmem0 nmem1
nmem1 is disabled.  skipping.
successfully verified 510 labels
hramrach commented 2 years ago

Is this still a problem with v72 or later?