Closed TurtleTony closed 1 month ago
nvme-cli uses sysfs to do discover nvme devices. You should be still able to use nvme-cli to operate on the device handles, e.g. nvme id-ctrl /dev/nvme0n1
. nvme list
and friends will not work though unless you map also the corresponding sysfs subtrees into your container.
the corresponding sysfs subtrees are:
Hi @igaw thanks for the super fast reply! I've followed your advice (nvme-fabrics doesn't exist on my device):
volumes:
...
- /sys/class/nvme:/sys/class/nvme
- /sys/class/nvme-generic:/sys/class/nvme-generic
- /sys/class/nvme-subsystem:/sys/class/nvme-subsystem
- /sys/bus/pci/slots:/sys/bus/pci/slots
The following command shows the subsys being detected:
root@CM3588:~# docker exec netdata nvme list-subsys
nvme-subsys3 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF0F45
\
+- nvme3 pcie 0003:31:00.0 live
nvme-subsys2 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF39F8
\
+- nvme2 pcie 0001:11:00.0 live
nvme-subsys1 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC267
\
+- nvme1 pcie 0002:21:00.0 live
nvme-subsys0 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC262
\
+- nvme0 pcie 0000:01:00.0 live
Alas, nvme list still doesn't work:
root@CM3588:~# docker exec netdata nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
nvme-fabrics
only exists on your host if you have the nvme fabric modules loaded, e.g. nvme-tcp
. You can safely ignore it if these sysfs dirs are missing.
nvme list-subsys
is doing almost the same as nvme list
. It first iterates over sysfs to gather all information and then prints it out.
Which version of nvme-cli are you using? If not the latest could you retry with the latest version?
Ah I understand, thanks for the list-subsys explanation :-)
Currently running on:
root@netdata:/tmp# nvme --version
nvme version 2.3 (git 2.3)
libnvme version 1.3 (git 1.3)
root@netdata:/tmp# nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
I used meson to build the most recent version, same result:
root@netdata:/tmp# ./nvme-cli-2.10.2/.build/nvme --version
nvme version 2.10.2 (git 2.10.2)
libnvme version 1.10 (git 1.10)
root@netdata:/tmp# ./nvme-cli-2.10.2/.build/nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
root@netdata:/tmp#
Hi @TurtleTony , can nvme list --verbose
show any nvme devices in docker-container ?
Good idea, with the latest version you can even enable the debug output by adding another -v
:
nvme list -vv
nvme list-subsys -vv
In theory this should give same debug output.
That's interesting! Indeed when using verbose, the devices show up. What does that mean?
root@netdata:/# nvme --version
nvme version 2.3 (git 2.3)
libnvme version 1.3 (git 1.3)
root@netdata:/# nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
root@netdata:/# nvme list -vv
Subsystem Subsystem-NQN Controllers
---------------- ------------------------------------------------------------------------------------------------ ----------------
nvme-subsys3 nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF0F45 nvme3
nvme-subsys2 nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF39F8 nvme2
nvme-subsys1 nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC267 nvme1
nvme-subsys0 nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC262 nvme0
Device SN MN FR TxPort Address Subsystem Namespaces
-------- -------------------- ---------------------------------------- -------- ------ -------------- ------------ ----------------
nvme3 23465W802861 WD Red SN700 1000GB 111150WD pcie 0003:31:00.0 nvme-subsys3
nvme2 23465W803996 WD Red SN700 1000GB 111150WD pcie 0001:11:00.0 nvme-subsys2
nvme1 23465W801929 WD Red SN700 1000GB 111150WD pcie 0002:21:00.0 nvme-subsys1
nvme0 23465W801930 WD Red SN700 1000GB 111150WD pcie 0000:01:00.0 nvme-subsys0
Device Generic NSID Usage Format Controllers
------------ ------------ -------- -------------------------- ---------------- ----------------
root@netdata:/# nvme list-subsys -vv
scan controller nvme0
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme0
scan controller nvme0 namespace nvme0n1
failed to scan namespace nvme0n1
scan controller nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
scan controller nvme1 namespace nvme1n1
failed to scan namespace nvme1n1
scan controller nvme2
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme2
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme2
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys2/nvme2
scan controller nvme2 namespace nvme2n1
failed to scan namespace nvme2n1
scan controller nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys2/nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys3/nvme3
scan controller nvme3 namespace nvme3n1
failed to scan namespace nvme3n1
scan subsystem nvme-subsys0
scan subsystem nvme-subsys1
scan subsystem nvme-subsys2
scan subsystem nvme-subsys3
nvme-subsys3 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF0F45
\
+- nvme3 pcie 0003:31:00.0 live
nvme-subsys2 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF39F8
\
+- nvme2 pcie 0001:11:00.0 live
nvme-subsys1 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC267
\
+- nvme1 pcie 0002:21:00.0 live
nvme-subsys0 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC262
\
+- nvme0 pcie 0000:01:00.0 live
With version 2.10.2 the output becomes even more verbose:
root@netdata:/tmp# ./nvme-cli-2.10.2/.build/nvme list -vv
scan controller nvme0
warning: using auto generated hostid and hostnqn
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme0
scan controller nvme0 namespace nvme0n1
opcode : 06
flags : 00
rsvd1 : 0000
nsid : 00000001
cdw2 : 00000000
cdw3 : 00000000
data_len : 00001000
metadata_len : 00000000
addr : 5560e3f000
metadata : 0
cdw10 : 00000000
cdw11 : 00000000
cdw12 : 00000000
cdw13 : 00000000
cdw14 : 00000000
cdw15 : 00000000
timeout_ms : 00000000
result : 00000000
err : -1
latency : 5 us
failed to scan namespace nvme0n1
scan controller nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
scan controller nvme1 namespace nvme1n1
opcode : 06
flags : 00
rsvd1 : 0000
nsid : 00000001
cdw2 : 00000000
cdw3 : 00000000
data_len : 00001000
metadata_len : 00000000
addr : 5560e3f000
metadata : 0
cdw10 : 00000000
cdw11 : 00000000
cdw12 : 00000000
cdw13 : 00000000
cdw14 : 00000000
cdw15 : 00000000
timeout_ms : 00000000
result : 00000000
err : -1
latency : 1 us
failed to scan namespace nvme1n1
scan controller nvme2
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme2
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme2
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys2/nvme2
scan controller nvme2 namespace nvme2n1
opcode : 06
flags : 00
rsvd1 : 0000
nsid : 00000001
cdw2 : 00000000
cdw3 : 00000000
data_len : 00001000
metadata_len : 00000000
addr : 5560e40000
metadata : 0
cdw10 : 00000000
cdw11 : 00000000
cdw12 : 00000000
cdw13 : 00000000
cdw14 : 00000000
cdw15 : 00000000
timeout_ms : 00000000
result : 00000000
err : -1
latency : 2 us
failed to scan namespace nvme2n1
scan controller nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys2/nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys3/nvme3
scan controller nvme3 namespace nvme3n1
opcode : 06
flags : 00
rsvd1 : 0000
nsid : 00000001
cdw2 : 00000000
cdw3 : 00000000
data_len : 00001000
metadata_len : 00000000
addr : 5560e40000
metadata : 0
cdw10 : 00000000
cdw11 : 00000000
cdw12 : 00000000
cdw13 : 00000000
cdw14 : 00000000
cdw15 : 00000000
timeout_ms : 00000000
result : 00000000
err : -1
latency : 1 us
failed to scan namespace nvme3n1
scan subsystem nvme-subsys0
scan subsystem nvme-subsys1
scan subsystem nvme-subsys2
scan subsystem nvme-subsys3
Subsystem Subsystem-NQN Controllers
---------------- ------------------------------------------------------------------------------------------------ ----------------
nvme-subsys0 nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC262 nvme0
nvme-subsys1 nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC267 nvme1
nvme-subsys2 nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF39F8 nvme2
nvme-subsys3 nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF0F45 nvme3
Device Cntlid SN MN FR TxPort Address Slot Subsystem Namespaces
---------------- ------ -------------------- ---------------------------------------- -------- ------ -------------- ------ ------------ ----------------
nvme0 8215 23465W801930 WD Red SN700 1000GB 111150WD pcie 0000:01:00.0 nvme-subsys0
nvme1 8215 23465W801929 WD Red SN700 1000GB 111150WD pcie 0002:21:00.0 nvme-subsys1
nvme2 8215 23465W803996 WD Red SN700 1000GB 111150WD pcie 0001:11:00.0 nvme-subsys2
nvme3 8215 23465W802861 WD Red SN700 1000GB 111150WD pcie 0003:31:00.0 nvme-subsys3
Device Generic NSID Usage Format Controllers
----------------- ----------------- ---------- -------------------------- ---------------- ----------------
The scanning of namespaces fails (failed to scan namespace nvme0n1
) that is why nvme list
doesn't show anything. This command only lists the namespaces and the other commands list subsystem etc.
I also see that newest version is trying to issue commands which fail. This indicates you have an older kernel which doesn't exposes all sysfs entry which libnvme needs to operate without issuing any commands. Also it explains why you don't see any namespaces, the commands do not work. This is likely a permission problem and nvme-cli/libnvme is not able to talk to the real hardware via the /dev/nvme
device node.
Your explanation makes sense to me it indeed seems to be a permission error. I don't think it's a kernel issue because I'm running on an up-to-date debian and a recent docker version, and the image used for the container is also quite recent. I found that when running it in dockers privileged mode, the nvme list command finally works!
root@netdata:/# nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme3n1 /dev/ng3n1 23465W802861 WD Red SN700 1000GB 1 1.00 TB / 1.00 TB 512 B + 0 B 111150WD
/dev/nvme2n1 /dev/ng2n1 23465W803996 WD Red SN700 1000GB 1 1.00 TB / 1.00 TB 512 B + 0 B 111150WD
/dev/nvme1n1 /dev/ng1n1 23465W801929 WD Red SN700 1000GB 1 1.00 TB / 1.00 TB 512 B + 0 B 111150WD
/dev/nvme0n1 /dev/ng0n1 23465W801930 WD Red SN700 1000GB 1 1.00 TB / 1.00 TB 512 B + 0 B 111150WD
root@netdata:/#
So it appears the passthrough using devices doesn't work and privileged mode has to be enabled. Thank you for your thorough assistance! I will have to think about whether I want to run this container in this mode, but either way this issue is solved. Have a nice day!
Hi there, I'm encountering a but with nvme-cli, when using it inside a docker container. It works fine on the host machine:
However when running it in a docker container I get no output:
The docker container was created with the following options:
and the devices seem to be available:
I am grateful for any assistance with this issue :-)