linux-nvme / nvme-cli

NVMe management command line interface.
https://nvmexpress.org
GNU General Public License v2.0
1.49k stars 659 forks source link

nvme fabrics discovery and connect failing with nvme-cli 2.11 and libnvme 1.11 #2555

Closed sukhi61166 closed 1 week ago

sukhi61166 commented 2 weeks ago

We are not able to connect/discover with nvme fabrics using the latest nvme-cli version. I tried few combinations and seems like the issue might be the libnvme 1.11. Both the connect and connect-all are failing. I don't see any errors in the dmesg.

  1. nvme version 2.10.2 (git 2.10.2) & libnvme version 1.10 (git 1.10): Connect works
  2. nvme version 2.10.2 (git 2.10.2) & libnvme version 1.11 (git 1.11): Fails to connect
  3. nvme version 2.11 (git 2.11-1-g64b2a25) & libnvme version 1.10 (git 1.10): Installation fails as the nvme-cli 2.11 requires libnvme to be at the latest 1.11 version.

Error seen

$.build/nvme --version
nvme version 2.11 (git 2.11-1-g64b2a25)
libnvme version 1.11 (git 1.11)
$ sudo ./nvme connect -vvv -t rdma -s 4420 -a 10.10.10.20 -i 1 -n nqn.2015-09.com:nvme.1
warning: using auto generated hostid and hostnqn
**could not add new controller: Operation not supported**
$ .build/nvme connect-all -t rdma -s 4420 -i 1 -a 10.10.10.7 -vvv
warning: using auto generated hostid and hostnqn
**failed to add controller, error Operation not supported**
$ .build/nvme discover -t rdma -s 4420 -a 10.10.9.72 -vvv
warning: using auto generated hostid and hostnqn
failed to add controller, error Operation not supported

Drives loaded

$ lsmod | grep nvme
nvme_rdma              45056  0
nvme_fabrics           32768  1 nvme_rdma
nvme                   53248  0
nvme_core             196608  3 nvme,nvme_rdma,nvme_fabrics
nvme_auth              24576  1 nvme_core
rdma_cm               143360  2 nvme_rdma,rdma_ucm
ib_core               466944  8 rdma_cm,nvme_rdma,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
ikegami-t commented 2 weeks ago

Seems related to the fix https://github.com/linux-nvme/libnvme/commit/9967817 as caused this issue by the change.

igaw commented 2 weeks ago

Do you compile your library with OpenSSL support enabled?

If we do not build with OpenSSL this here is obviously wrong:

https://github.com/linux-nvme/libnvme/blob/89ea2b72ce23d376f10c0f79e6f4333ef2fdfe06/src/nvme/linux.c#L1680C1-L1685C2

It should return 0; Can you either build with OpenSSL enabled or replace the return value to 0 there?

ksingh-ospo commented 2 weeks ago

@igaw,

./subprojects/libnvme/internal/config.h:#define CONFIG_OPENSSL
./subprojects/libnvme/internal/config.h:#define CONFIG_OPENSSL_3

It seems that both of the above config options were enabled by default. We did not enable them explicitly though. We prefer to build nvme-cli with default options including with openssl as it may be needed for using TLS.

./meson-logs/meson-log.txt:292:Run-time dependency openssl found: YES 3.0.13
./meson-logs/meson-log.txt:315:Header "openssl/opensslv.h" has symbol "LIBRESSL_VERSION_NUMBER" with dependency openssl: NO
./meson-logs/meson-log.txt:329:Has header "openssl/core_names.h" with dependency openssl: YES
igaw commented 2 weeks ago

CONFIG_OPENSSL is just saying there OpenSSL available and CONFIG_OPENSSL_3 says it is version 3. The meson logs look okay. The OpenSSL version detection is not straight forward, because the OpenSSL folks thought they know better.

Anyway, I've mixed up OpenSSL with libkeyutils. Can you check if you have this library installed or attach the complete meson configure step output. Also did you try to change the return value in __nvme_import_keys_from_config?

igaw commented 1 week ago

Should be addressed with #908. If not, please reopen the bug report and attache additional infos.