linux-nvme / nvme-cli

NVMe management command line interface.
https://nvmexpress.org
GNU General Public License v2.0
1.48k stars 657 forks source link

NVME devices not detected inside docker-container #2533

Closed TurtleTony closed 4 weeks ago

TurtleTony commented 1 month ago

Hi there, I'm encountering a but with nvme-cli, when using it inside a docker container. It works fine on the host machine:

root@CM3588:~# nvme list
Node             SN                   Model                                    Namespace Usage                      Format           FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     23465W801930         WD Red SN700 1000GB                      1           1.00  TB /   1.00  TB    512   B +  0 B   111150WD
/dev/nvme1n1     23465W801929         WD Red SN700 1000GB                      1           1.00  TB /   1.00  TB    512   B +  0 B   111150WD
/dev/nvme2n1     23465W803996         WD Red SN700 1000GB                      1           1.00  TB /   1.00  TB    512   B +  0 B   111150WD
/dev/nvme3n1     23465W802861         WD Red SN700 1000GB                      1           1.00  TB /   1.00  TB    512   B +  0 B   111150WD

However when running it in a docker container I get no output:

root@CM3588:~# docker exec netdata nvme list
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
root@CM3588:~#

The docker container was created with the following options:

devices:
      - "/dev/nvme0n1:/dev/nvme0n1"
      - "/dev/nvme1n1:/dev/nvme1n1"
      - "/dev/nvme2n1:/dev/nvme2n1"
      - "/dev/nvme3n1:/dev/nvme3n1"

and the devices seem to be available:

root@CM3588:~# docker exec netdata ls -l /dev/nvme*
brw-rw---- 1 root disk 259, 1 Oct 10 16:41 /dev/nvme0n1
brw-rw---- 1 root disk 259, 2 Oct 10 16:41 /dev/nvme1n1
brw-rw---- 1 root disk 259, 0 Oct 10 16:41 /dev/nvme2n1
brw-rw---- 1 root disk 259, 3 Oct 10 16:41 /dev/nvme3n1

I am grateful for any assistance with this issue :-)

igaw commented 1 month ago

nvme-cli uses sysfs to do discover nvme devices. You should be still able to use nvme-cli to operate on the device handles, e.g. nvme id-ctrl /dev/nvme0n1. nvme list and friends will not work though unless you map also the corresponding sysfs subtrees into your container.

igaw commented 1 month ago

the corresponding sysfs subtrees are:

https://github.com/linux-nvme/libnvme/blob/8cdd746b324bd84a0666e7a265aa253dbda9d932/scripts/collect-sysfs.sh#L6-L11

TurtleTony commented 1 month ago

Hi @igaw thanks for the super fast reply! I've followed your advice (nvme-fabrics doesn't exist on my device):

volumes:
      ...
      - /sys/class/nvme:/sys/class/nvme
      - /sys/class/nvme-generic:/sys/class/nvme-generic
      - /sys/class/nvme-subsystem:/sys/class/nvme-subsystem
      - /sys/bus/pci/slots:/sys/bus/pci/slots

The following command shows the subsys being detected:

root@CM3588:~# docker exec netdata nvme list-subsys
nvme-subsys3 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF0F45
\
 +- nvme3 pcie 0003:31:00.0 live
nvme-subsys2 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF39F8
\
 +- nvme2 pcie 0001:11:00.0 live
nvme-subsys1 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC267
\
 +- nvme1 pcie 0002:21:00.0 live
nvme-subsys0 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC262
\
 +- nvme0 pcie 0000:01:00.0 live

Alas, nvme list still doesn't work:

root@CM3588:~# docker exec netdata nvme list
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
igaw commented 4 weeks ago

nvme-fabrics only exists on your host if you have the nvme fabric modules loaded, e.g. nvme-tcp. You can safely ignore it if these sysfs dirs are missing.

nvme list-subsys is doing almost the same as nvme list. It first iterates over sysfs to gather all information and then prints it out.

Which version of nvme-cli are you using? If not the latest could you retry with the latest version?

TurtleTony commented 4 weeks ago

Ah I understand, thanks for the list-subsys explanation :-)

Currently running on:

root@netdata:/tmp# nvme --version
nvme version 2.3 (git 2.3)
libnvme version 1.3 (git 1.3)
root@netdata:/tmp# nvme list
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------

I used meson to build the most recent version, same result:

root@netdata:/tmp# ./nvme-cli-2.10.2/.build/nvme --version
nvme version 2.10.2 (git 2.10.2)
libnvme version 1.10 (git 1.10)
root@netdata:/tmp# ./nvme-cli-2.10.2/.build/nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
root@netdata:/tmp#
benchuanggli commented 4 weeks ago

Hi @TurtleTony , can nvme list --verbose show any nvme devices in docker-container ?

igaw commented 4 weeks ago

Good idea, with the latest version you can even enable the debug output by adding another -v:

nvme list -vv nvme list-subsys -vv

In theory this should give same debug output.

TurtleTony commented 4 weeks ago

That's interesting! Indeed when using verbose, the devices show up. What does that mean?

root@netdata:/# nvme --version
nvme version 2.3 (git 2.3)
libnvme version 1.3 (git 1.3)
root@netdata:/# nvme list
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
root@netdata:/# nvme list -vv
Subsystem        Subsystem-NQN                                                                                    Controllers
---------------- ------------------------------------------------------------------------------------------------ ----------------
nvme-subsys3     nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF0F45                                     nvme3
nvme-subsys2     nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF39F8                                     nvme2
nvme-subsys1     nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC267                                     nvme1
nvme-subsys0     nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC262                                     nvme0

Device   SN                   MN                                       FR       TxPort Address        Subsystem    Namespaces
-------- -------------------- ---------------------------------------- -------- ------ -------------- ------------ ----------------
nvme3    23465W802861         WD Red SN700 1000GB                      111150WD pcie   0003:31:00.0   nvme-subsys3
nvme2    23465W803996         WD Red SN700 1000GB                      111150WD pcie   0001:11:00.0   nvme-subsys2
nvme1    23465W801929         WD Red SN700 1000GB                      111150WD pcie   0002:21:00.0   nvme-subsys1
nvme0    23465W801930         WD Red SN700 1000GB                      111150WD pcie   0000:01:00.0   nvme-subsys0

Device       Generic      NSID     Usage                      Format           Controllers
------------ ------------ -------- -------------------------- ---------------- ----------------
root@netdata:/# nvme list-subsys -vv
scan controller nvme0
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme0
scan controller nvme0 namespace nvme0n1
failed to scan namespace nvme0n1
scan controller nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
scan controller nvme1 namespace nvme1n1
failed to scan namespace nvme1n1
scan controller nvme2
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme2
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme2
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys2/nvme2
scan controller nvme2 namespace nvme2n1
failed to scan namespace nvme2n1
scan controller nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys2/nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys3/nvme3
scan controller nvme3 namespace nvme3n1
failed to scan namespace nvme3n1
scan subsystem nvme-subsys0
scan subsystem nvme-subsys1
scan subsystem nvme-subsys2
scan subsystem nvme-subsys3
nvme-subsys3 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF0F45
\
 +- nvme3 pcie 0003:31:00.0 live
nvme-subsys2 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF39F8
\
 +- nvme2 pcie 0001:11:00.0 live
nvme-subsys1 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC267
\
 +- nvme1 pcie 0002:21:00.0 live
nvme-subsys0 - NQN=nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC262
\
 +- nvme0 pcie 0000:01:00.0 live

With version 2.10.2 the output becomes even more verbose:

root@netdata:/tmp# ./nvme-cli-2.10.2/.build/nvme list -vv
scan controller nvme0
warning: using auto generated hostid and hostnqn
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme0
scan controller nvme0 namespace nvme0n1
opcode       : 06
flags        : 00
rsvd1        : 0000
nsid         : 00000001
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00001000
metadata_len : 00000000
addr         : 5560e3f000
metadata     : 0
cdw10        : 00000000
cdw11        : 00000000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000
result       : 00000000
err          : -1
latency      : 5 us
failed to scan namespace nvme0n1
scan controller nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
scan controller nvme1 namespace nvme1n1
opcode       : 06
flags        : 00
rsvd1        : 0000
nsid         : 00000001
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00001000
metadata_len : 00000000
addr         : 5560e3f000
metadata     : 0
cdw10        : 00000000
cdw11        : 00000000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000
result       : 00000000
err          : -1
latency      : 1 us
failed to scan namespace nvme1n1
scan controller nvme2
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme2
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme2
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys2/nvme2
scan controller nvme2 namespace nvme2n1
opcode       : 06
flags        : 00
rsvd1        : 0000
nsid         : 00000001
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00001000
metadata_len : 00000000
addr         : 5560e40000
metadata     : 0
cdw10        : 00000000
cdw11        : 00000000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000
result       : 00000000
err          : -1
latency      : 2 us
failed to scan namespace nvme2n1
scan controller nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys2/nvme3
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys3/nvme3
scan controller nvme3 namespace nvme3n1
opcode       : 06
flags        : 00
rsvd1        : 0000
nsid         : 00000001
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00001000
metadata_len : 00000000
addr         : 5560e40000
metadata     : 0
cdw10        : 00000000
cdw11        : 00000000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000
result       : 00000000
err          : -1
latency      : 1 us
failed to scan namespace nvme3n1
scan subsystem nvme-subsys0
scan subsystem nvme-subsys1
scan subsystem nvme-subsys2
scan subsystem nvme-subsys3
Subsystem        Subsystem-NQN                                                                                    Controllers
---------------- ------------------------------------------------------------------------------------------------ ----------------
nvme-subsys0     nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC262                                     nvme0
nvme-subsys1     nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCFC267                                     nvme1
nvme-subsys2     nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF39F8                                     nvme2
nvme-subsys3     nqn.2018-01.com.wdc:nguid:E8238FA6BF53-0001-001B448B4CCF0F45                                     nvme3

Device           Cntlid SN                   MN                                       FR       TxPort Address        Slot   Subsystem    Namespaces
---------------- ------ -------------------- ---------------------------------------- -------- ------ -------------- ------ ------------ ----------------
nvme0    8215   23465W801930         WD Red SN700 1000GB                      111150WD pcie   0000:01:00.0          nvme-subsys0
nvme1    8215   23465W801929         WD Red SN700 1000GB                      111150WD pcie   0002:21:00.0          nvme-subsys1
nvme2    8215   23465W803996         WD Red SN700 1000GB                      111150WD pcie   0001:11:00.0          nvme-subsys2
nvme3    8215   23465W802861         WD Red SN700 1000GB                      111150WD pcie   0003:31:00.0          nvme-subsys3

Device            Generic           NSID       Usage                      Format           Controllers
----------------- ----------------- ---------- -------------------------- ---------------- ----------------
igaw commented 4 weeks ago

The scanning of namespaces fails (failed to scan namespace nvme0n1) that is why nvme list doesn't show anything. This command only lists the namespaces and the other commands list subsystem etc.

I also see that newest version is trying to issue commands which fail. This indicates you have an older kernel which doesn't exposes all sysfs entry which libnvme needs to operate without issuing any commands. Also it explains why you don't see any namespaces, the commands do not work. This is likely a permission problem and nvme-cli/libnvme is not able to talk to the real hardware via the /dev/nvme device node.

TurtleTony commented 4 weeks ago

Your explanation makes sense to me it indeed seems to be a permission error. I don't think it's a kernel issue because I'm running on an up-to-date debian and a recent docker version, and the image used for the container is also quite recent. I found that when running it in dockers privileged mode, the nvme list command finally works!

root@netdata:/# nvme list
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme3n1          /dev/ng3n1            23465W802861         WD Red SN700 1000GB                      1           1.00  TB /   1.00  TB    512   B +  0 B   111150WD
/dev/nvme2n1          /dev/ng2n1            23465W803996         WD Red SN700 1000GB                      1           1.00  TB /   1.00  TB    512   B +  0 B   111150WD
/dev/nvme1n1          /dev/ng1n1            23465W801929         WD Red SN700 1000GB                      1           1.00  TB /   1.00  TB    512   B +  0 B   111150WD
/dev/nvme0n1          /dev/ng0n1            23465W801930         WD Red SN700 1000GB                      1           1.00  TB /   1.00  TB    512   B +  0 B   111150WD
root@netdata:/#

So it appears the passthrough using devices doesn't work and privileged mode has to be enabled. Thank you for your thorough assistance! I will have to think about whether I want to run this container in this mode, but either way this issue is solved. Have a nice day!