Closed johnmeneghini closed 2 years ago
thanks for the report. i'm taking a look now.
the two versions implementation are quite different. might take a moment for me to untangle it (i don't do fabrics too often).
I run into the same problem. As far I undrstand the problem is in libnvme's nvmf_connect_disc_entry()
.
c = nvme_create_ctrl(e->subnqn, transport, traddr, NULL, NULL, trsvcid);
https://github.com/linux-nvme/libnvme/blob/6b951c53cc4c978b0617e392c776aa16400f7d63/src/nvme/fabrics.c#L651
struct nvme_ctrl *nvme_create_ctrl(const char *subsysnqn, const char *transport,
const char *traddr, const char *host_traddr,
const char *host_iface, const char *trsvcid)
I'm testing 'connect-all' but it still fails. cfg->host_traddr
and cfg->host_iface
was still NULL
. My workaround is:
--- a/fabrics.c
+++ b/fabrics.c
@@ -396,6 +396,8 @@ static int discover_from_conf_file(nvme_host_t h, const char *desc,
errno = 0;
ret = nvmf_add_ctrl(h, c, &cfg, false);
if (!ret) {
+ cfg.host_traddr = host_traddr;
+ cfg.host_iface = host_iface;
__discover(c, &cfg, raw, connect,
persistent, flags);
if (!persistent)
But that still fails:
dolin:~/nvme-cli/.build/:[255]# ./nvme connect-all
Failed to read /etc/nvme/config.json, json_object_from_file: error opening file /etc/nvme/config.json: No such file or directory
connect ctrl, 'nqn=nqn.2014-08.org.nvmexpress.discovery,transport=fc,traddr=nn-0x201700a09890f5bf:pn-0x201900a09890f5bf,host_traddr=nn-0x200000109b579ef3:pn-0x100000109b579ef3,hostnqn=nqn.2014-08.org.nvmexpress:uuid:1a9e23dd-466e-45ca-9f43-a29aaf47cb21,hostid=1a9e23dd-466e-45ca-9f43-a29aaf47cb21,ctrl_loss_tmo=600'
connect ctrl, response 'instance=0,cntlid=17088'
nvme0: ctrl connected
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme0
nvme0: discover length 256
nvme0: discover length 5120
nvme0: discover genctr 6573, retry
lookup ctrl (transport: fc, traddr: nn-0x201700a09890f5bf:pn-0x201800a09890f5bf, trsvcid (null))
connect ctrl, 'nqn=nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem,transport=fc,traddr=nn-0x201700a09890f5bf:pn-0x201800a09890f5bf,host_traddr=nn-0x200000109b579ef3:pn-0x100000109b579ef3,hostnqn=nqn.2014-08.org.nvmexpress:uuid:1a9e23dd-466e-45ca-9f43-a29aaf47cb21,hostid=1a9e23dd-466e-45ca-9f43-a29aaf47cb21,ctrl_loss_tmo=600'
connect ctrl, response 'instance=1,cntlid=31424'
nvme1: ctrl connected
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
nvme1: disconnected
lookup ctrl (transport: fc, traddr: nn-0x201700a09890f5bf:pn-0x201900a09890f5bf, trsvcid (null))
connect ctrl, 'nqn=nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem,transport=fc,traddr=nn-0x201700a09890f5bf:pn-0x201900a09890f5bf,host_traddr=nn-0x200000109b579ef3:pn-0x100000109b579ef3,hostnqn=nqn.2014-08.org.nvmexpress:uuid:1a9e23dd-466e-45ca-9f43-a29aaf47cb21,hostid=1a9e23dd-466e-45ca-9f43-a29aaf47cb21,ctrl_loss_tmo=600'
connect ctrl, response 'instance=1,cntlid=31488'
nvme1: ctrl connected
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
nvme1: disconnected
lookup ctrl (transport: fc, traddr: nn-0x201700a09890f5bf:pn-0x201800a09890f5bf, trsvcid (null))
connect ctrl, 'nqn=nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem_mwilck,transport=fc,traddr=nn-0x201700a09890f5bf:pn-0x201800a09890f5bf,host_traddr=nn-0x200000109b579ef3:pn-0x100000109b579ef3,hostnqn=nqn.2014-08.org.nvmexpress:uuid:1a9e23dd-466e-45ca-9f43-a29aaf47cb21,hostid=1a9e23dd-466e-45ca-9f43-a29aaf47cb21,ctrl_loss_tmo=600'
connect ctrl, response 'instance=1,cntlid=24000'
nvme1: ctrl connected
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
nvme1: disconnected
lookup ctrl (transport: fc, traddr: nn-0x201700a09890f5bf:pn-0x201900a09890f5bf, trsvcid (null))
connect ctrl, 'nqn=nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem_mwilck,transport=fc,traddr=nn-0x201700a09890f5bf:pn-0x201900a09890f5bf,host_traddr=nn-0x200000109b579ef3:pn-0x100000109b579ef3,hostnqn=nqn.2014-08.org.nvmexpress:uuid:1a9e23dd-466e-45ca-9f43-a29aaf47cb21,hostid=1a9e23dd-466e-45ca-9f43-a29aaf47cb21,ctrl_loss_tmo=600'
connect ctrl, response 'instance=1,cntlid=24064'
nvme1: ctrl connected
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
nvme1: disconnected
nvme0: disconnected
connect ctrl, 'nqn=nqn.2014-08.org.nvmexpress.discovery,transport=fc,traddr=nn-0x201700a09890f5bf:pn-0x201900a09890f5bf,host_traddr=nn-0x200000109b579ef6:pn-0x100000109b579ef6,hostnqn=nqn.2014-08.org.nvmexpress:uuid:1a9e23dd-466e-45ca-9f43-a29aaf47cb21,hostid=1a9e23dd-466e-45ca-9f43-a29aaf47cb21,ctrl_loss_tmo=600'
connect ctrl, response 'instance=0,cntlid=17152'
nvme0: ctrl connected
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme0
nvme0: discover length 256
nvme0: discover length 5120
nvme0: discover genctr 6573, retry
lookup ctrl (transport: fc, traddr: nn-0x201700a09890f5bf:pn-0x201800a09890f5bf, trsvcid (null))
connect ctrl, 'nqn=nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem,transport=fc,traddr=nn-0x201700a09890f5bf:pn-0x201800a09890f5bf,host_traddr=nn-0x200000109b579ef6:pn-0x100000109b579ef6,hostnqn=nqn.2014-08.org.nvmexpress:uuid:1a9e23dd-466e-45ca-9f43-a29aaf47cb21,hostid=1a9e23dd-466e-45ca-9f43-a29aaf47cb21,ctrl_loss_tmo=600'
connect ctrl, response 'instance=1,cntlid=31552'
nvme1: ctrl connected
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
nvme1: disconnected
lookup ctrl (transport: fc, traddr: nn-0x201700a09890f5bf:pn-0x201900a09890f5bf, trsvcid (null))
connect ctrl, 'nqn=nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem,transport=fc,traddr=nn-0x201700a09890f5bf:pn-0x201900a09890f5bf,host_traddr=nn-0x200000109b579ef6:pn-0x100000109b579ef6,hostnqn=nqn.2014-08.org.nvmexpress:uuid:1a9e23dd-466e-45ca-9f43-a29aaf47cb21,hostid=1a9e23dd-466e-45ca-9f43-a29aaf47cb21,ctrl_loss_tmo=600'
connect ctrl, response 'instance=1,cntlid=31616'
nvme1: ctrl connected
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
nvme1: disconnected
lookup ctrl (transport: fc, traddr: nn-0x201700a09890f5bf:pn-0x201800a09890f5bf, trsvcid (null))
connect ctrl, 'nqn=nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem_mwilck,transport=fc,traddr=nn-0x201700a09890f5bf:pn-0x201800a09890f5bf,host_traddr=nn-0x200000109b579ef6:pn-0x100000109b579ef6,hostnqn=nqn.2014-08.org.nvmexpress:uuid:1a9e23dd-466e-45ca-9f43-a29aaf47cb21,hostid=1a9e23dd-466e-45ca-9f43-a29aaf47cb21,ctrl_loss_tmo=600'
connect ctrl, response 'instance=1,cntlid=24128'
nvme1: ctrl connected
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
nvme1: disconnected
lookup ctrl (transport: fc, traddr: nn-0x201700a09890f5bf:pn-0x201900a09890f5bf, trsvcid (null))
connect ctrl, 'nqn=nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem_mwilck,transport=fc,traddr=nn-0x201700a09890f5bf:pn-0x201900a09890f5bf,host_traddr=nn-0x200000109b579ef6:pn-0x100000109b579ef6,hostnqn=nqn.2014-08.org.nvmexpress:uuid:1a9e23dd-466e-45ca-9f43-a29aaf47cb21,hostid=1a9e23dd-466e-45ca-9f43-a29aaf47cb21,ctrl_loss_tmo=600'
connect ctrl, response 'instance=1,cntlid=24192'
nvme1: ctrl connected
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
nvme1: disconnected
nvme0: disconnected
connect ctrl, 'nqn=nqn.2014-08.org.nvmexpress.discovery,transport==fc --traddr=nn-0x201700a09890f5bf:pn-0x201b00a09890f5bf --host-traddr=nn-0x200000109b579ef6:pn-0x100000109b579ef6 '
Failed to write to /dev/nvme-fabrics: Invalid argument
The kernel is complaining with
[765139.069194] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[765139.091684] nvme_fabrics: no handler found for transport =fc --traddr=nn-0x201700a09890f5bf:pn-0x201b00a09890f5bf --host-traddr=nn-0x200000109b579ef6:pn-0x100000109b579ef6
Ah. The 'connect' string isn't parsed correctly.
Correct fix should be
diff --git a/fabrics.c b/fabrics.c index 4ad5291..af957ba 100644 --- a/fabrics.c +++ b/fabrics.c @@ -81,8 +81,8 @@ static const char *nvmf_config_file = "Use specified JSON configuration file or OPT_STRING("transport", 't', "STR", &transport, nvmf_tport), \ OPT_STRING("traddr", 'a', "STR", &traddr, nvmf_traddr), \ OPT_STRING("trsvcid", 's', "STR", &trsvcid, nvmf_trsvcid), \
- OPT_STRING("host-traddr", 'w', "STR", &host_traddr, nvmf_htraddr), \
- OPT_STRING("host-iface", 'f', "STR", &host_iface, nvmf_hiface), \
- OPT_STRING("host-traddr", 'w', "STR", &c.host_traddr, nvmf_htraddr), \
- OPT_STRING("host-iface", 'f', "STR", &c.host_iface, nvmf_hiface), \ OPT_STRING("hostnqn", 'q', "STR", &hostnqn, nvmf_hostnqn), \ OPT_STRING("hostid", 'I', "STR", &hostid, nvmf_hostid), \ OPT_STRING("nqn", 'n', "STR", &subsysnqn, nvmf_nqn), \
Would've done it myself if I knew how I can teach meson to update libnvme.
Please check PR #1318 .
Would've done it myself if I knew how I can teach meson to update libnvme.
When you do the first meson .build
(or make), libnvme will be checkout out as normal git tree under subprojects/libnvme
. After the initial checkout meson doesn't touch this git tree unless you do something like meson subproject update
. There is nothing magically going on :)
That means you can do any git operation after the initial checkout as you like.
Figured it out meanwhile. Please check the above PR.
The parsing error is gone. Still no connection after 'connect-all'.
The kernel says:
[769021.443721] nvme nvme1: NVME-FC{1}: create association : host wwpn 0x100000109b579ef6 rport wwpn 0x201900a09890f5bf: NQN "nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem_mwilck"
[769022.397219] nvme nvme1: queue_size 128 > ctrl maxcmd 32, reducing to maxcmd
[769022.897298] nvme nvme1: NVME-FC{1}: controller connect complete
[769022.897339] nvme nvme1: NVME-FC{1}: new ctrl: NQN "nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem_mwilck"
[769022.897615] nvme nvme1: Removing ctrl: NQN "nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem_mwilck"
[769023.206954] block nvme1n1: no available path - failing I/O
[769023.206960] block nvme1n1: no available path - failing I/O
[769023.206963] Buffer I/O error on dev nvme1n1, logical block 8388592, async page read
[769023.277244] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
and nvme-cli:
lookup ctrl (transport: fc, traddr: nn-0x201700a09890f5bf:pn-0x201900a09890f5bf, trsvcid (null))
connect ctrl, 'nqn=nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem_mwilck,transport=fc,traddr=nn-0x201700a09890f5bf:pn-0x201900a09890f5bf,host_traddr=nn-0x200000109b579ef6:pn-0x100000109b579ef6,hostnqn=nqn.2014-08.org.nvmexpress:uuid:1a9e23dd-466e-45ca-9f43-a29aaf47cb21,hostid=1a9e23dd-466e-45ca-9f43-a29aaf47cb21,ctrl_loss_tmo=600'
connect ctrl, response 'instance=1,cntlid=25984'
nvme1: ctrl connected
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
nvme1: disconnected
nvme0: disconnected
adding some more tracing...
Looks as if nvme-cli disconnects the controller immediately after connecting ...
This is the output from 'connect':
[769451.837742] nvme nvme0: NVME-FC{0}: create association : host wwpn 0x100000109b579ef3 rport wwpn 0x201900a09890f5bf: NQN "nqn.2014-08.org.nvmexpress.discovery"
[769452.532383] nvme nvme0: queue_size 128 > ctrl maxcmd 32, reducing to maxcmd
[769452.532389] nvme nvme0: NVME-FC{0}: controller connect complete
[769452.532428] nvme nvme0: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[769454.033336] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[769460.316838] nvme nvme0: NVME-FC{0}: create association : host wwpn 0x100000109b579ef3 rport wwpn 0x201900a09890f5bf: NQN "nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem_mwilck"
[769461.237133] nvme nvme0: queue_size 128 > ctrl maxcmd 32, reducing to maxcmd
[769461.737435] nvme nvme0: NVME-FC{0}: controller connect complete
[769461.737483] nvme nvme0: NVME-FC{0}: new ctrl: NQN "nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem_mwilck"
It looks like the disconnect should be from the discovery controller not the new controller.
if (child) {
if (discover) __discover(child, defcfg, raw, persistent, true, flags); if (!persistent) { nvme_disconnect_ctrl(child); nvme_free_ctrl(child); }
The 'if (discover)' ... 'if (!persistent)' conditions look dodgy; seems like we would disconnect non-discovery controllers here...
yep, that's where we disconnect.
Fix pushed to PR #1318 . Please test.
Works with the latest version from #1318.
Fixes merged. Closing bug. If it still failing please reopen.
I just want to report that I've tested out these changes and they work great! Thank for fixing this bug.
[root@rhel-storage-08 nvme-cli]# .build/nvme connect-all --transport=tcp --trsvcid=4420 --traddr=172.16.21.241 --host-traddr=172.16.21.108
[609852.098034] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.16.21.241:4420
failed to connect controller, error 22
[609852.182222] nvme_fc: nvme_fc_parse_traddr: bad traddr string
[609852.226806] nvme_fc: nvme_fc_parse_traddr: bad traddr string
failed to connect controller, error 22
failed to connect controller, error 22
[609852.272140] nvme_fc: nvme_fc_parse_traddr: bad traddr string
failed to connect controller, error 22
[609852.317337] nvme_fc: nvme_fc_parse_traddr: bad traddr string
[609852.391945] nvme nvme3: creating 12 I/O queues.
[609852.425913] nvme nvme3: mapped 12/0/0 default/read/poll queues.
[609852.459345] nvme nvme3: new ctrl: NQN "nqn.1988-11.com.dell:powerstore:00:88b402df2d762AA7AF94", addr 172.16.21.241:4420
failed to connect controller, error 22
[609852.518580] nvme_fc: nvme_fc_parse_traddr: bad traddr string
[609852.562760] nvme_fc: nvme_fc_parse_traddr: bad traddr string
failed to connect controller, error 22
failed to connect controller, error 22
[609852.606262] nvme_fc: nvme_fc_parse_traddr: bad traddr string
failed to connect controller, error 22
[609852.651592] nvme_fc: nvme_fc_parse_traddr: bad traddr string
[609852.727124] nvme nvme4: creating 12 I/O queues.
[609852.758512] nvme nvme4: mapped 12/0/0 default/read/poll queues.
[609852.791584] nvme nvme4: new ctrl: NQN "nqn.1988-11.com.dell:powerstore:00:88b402df2d762AA7AF94", addr 172.16.21.240:4420
[609852.816619] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Thanks for reporting and testing!
I'm unable to get nvme connect-all --transport=tcp to work with a multi-path tcp array.
This is a RHEL 9 beta release that has been patched up to v5.16-r8.
Note that the array discovery service is returning multiple discovery log page entries for both fc and tcp. We only care about the tcp entries and we might want to teach nvme connect-all how to ignore or filter different trtypes. There's no sense in trying to connect over different transports at the same time.
There are two subsystem ports accessible to the host on the network at --host-traddr=172.16.21.8
nvme connect works fine.
nvme disconnect works fine.
nvme connect-all gets totally confused.
Note that the legacy nvme connect-all command works fine.