open-iscsi / targetcli-fb

Command shell for managing Linux LIO kernel target
Apache License 2.0
101 stars 69 forks source link

Could not create Target in configFS #159

Open SteffanCline opened 4 years ago

SteffanCline commented 4 years ago

I've googled this particular issue and not really finding any good results on how to resolve it.

I installed targetcli via yum on CentOS 7 and this is the version in the repo: # targetcli targetcli shell version 2.1.fb49 Copyright 2011-2013 by Datera, Inc and others. For help on commands, type 'help'.

I downloaded the .zip of the targetcli-fb version and tried it too from the scripts folder and same results.

Python: # python --version Python 2.7.5

Libs: Installed Packages python-configshell.noarch 1:1.1.fb25-1.el7 @base python-rtslib.noarch 2.1.fb69-3.el7 @base python-rtslib-doc.noarch 2.1.fb69-3.el7 @base

The error comes up when using create. /> /qla2xxx create naa.xxxxxxxxxxxxxxx Could not create Target in configFS

Below is the info on my FC /> qla2xxx/ info Fabric module name: qla2xxx ConfigFS path: /sys/kernel/config/target/qla2xxx Allowed WWN types: naa Allowed WWNs list: naa.xxxxxxxxxxxxxxx, naa.xxxxxxxxxxxxxxx, naa.xxxxxxxxxxxxxxx, naa.xxxxxxxxxxxxxxx, naa.xxxxxxxxxxxxxxx, naa.xxxxxxxxxxxxxxx, naa.xxxxxxxxxxxxxxx, naa.xxxxxxxxxxxxxxx Fabric module features: acls Corresponding kernel module: tcm_qla2xxx

I have attached the log but don't see anything that really stands out. log.txt

I checked per the link http://www.linux-iscsi.org/wiki/Fibre_Channel#Enable_target_mode to ensure that I have initiator mode disabled

# cat /sys/module/qla2xxx/parameters/qlini_mode disabled

The kernel modules are loaded. # lsmod | grep qla2xxx tcm_qla2xxx 32768 1 target_core_mod 335872 4 tcm_qla2xxx,iscsi_target_mod qla2xxx 634880 1 tcm_qla2xxx scsi_transport_fc 65536 3 bfa,qla2xxx,tcm_qla2xxx

configfs is mounted: # cat /proc/mounts | grep configfs configfs /sys/kernel/config configfs rw,relatime 0 0

In the kernel config: CONFIG_CONFIGFS_FS=y

Kernel: 4.4.207-1.el7.elrepo.x86_64

Should I be able to create a directory in /sys/kernel/config/target/qla2xxxx? # mkdir naa.10008c7cffc7ef01 mkdir: cannot create directory ‘naa.xxxxxxxxxxxxxxx’: Invalid argument

I did notice that when I rebooted, there was no qla2xxx folder in /sys/kernel/config/target/ but after trying to use targetcli, it did make one so I'm not sure what else would cause it to fail.

Suggestions?

SteffanCline commented 4 years ago

In waiting, I tried CentOS 8 which was an even bigger bust. I wiped that clean and tried again with Fedora 31. Same darn error "Could not create Target in configFS". Anyone??

SteffanCline commented 4 years ago

I figure the type of HBA would be good to know. They're a Cavium QLogic BR-1860 QLE2662 which is supported from all I've read. Here are some details.

# systool -c fc_host -v Class = "fc_host" ` Class Device = "host2" Class Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/host2/fc_host/host2" active_fc4s = "0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 " dev_loss_tmo = "60" fabric_name = "0x0" issue_lip = max_npiv_vports = "255" maxframe_size = "0 bytes" node_name = "0x20008c7cffc7ef00" npiv_vports_inuse = "0" port_id = "0x000000" port_name = "0x10008c7cffc7ef00" port_state = "Linkdown" port_type = "Unknown" speed = "unknown" supported_classes = "Class 3" supported_fc4s = "0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 " supported_speeds = "2 Gbit, 4 Gbit, 8 Gbit, 16 Gbit" symbolic_name = "QLogic-1860-2p | 3.2.25.1 | | | " tgtid_bind_type = "wwpn (World Wide Port Name)" uevent = vport_create = vport_delete = Device = "host2" Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/host2" uevent = "DEVTYPE=scsi_host"`

maurizio-lombardi commented 4 years ago

In waiting, I tried CentOS 8 which was an even bigger bust. I wiped that clean and tried again with Fedora 31. Same darn error "Could not create Target in configFS". Anyone??

I am going to investigate this.

maurizio-lombardi commented 4 years ago

In waiting, I tried CentOS 8 which was an even bigger bust.

Note: it's not going to work in CentOS 8 because the Red Hat management decided to disable the qla2xxx target kernel module.

If I remember correctly, there were concerns about the stability of that code. https://bugzilla.redhat.com/show_bug.cgi?id=1666377

SteffanCline commented 4 years ago

I was able to load the module and made everything look like I had it in CentOS 7 which let me get much further but still whenever looking at the structure via ls in targetcli CentOS 8, it just wouldn't show up.

Any idea about the logging of that error so I can trace down the issue?

FWIW, it's not working in CentOS 7, CentOS 8 or even Fedora 31. Actually 7 and 31 behave exactly the same.

Also, @maurizio-lombardi I see you are a maintainer. I REALLY appreciate your help. I have HOURS into this and getting nowhere like this is literally driving me mad.

maurizio-lombardi commented 4 years ago

Kernel: 4.4.207-1.el7.elrepo.x86_64

Did you also test the default 3.10.0-* Centos7 kernel ?

maurizio-lombardi commented 4 years ago

Also, @maurizio-lombardi I see you are a maintainer. I REALLY appreciate your help. I have HOURS into this and getting nowhere like this is literally driving me mad.

Thanks :) I sent an email to one of my colleagues that works with qla2xxx and I am waiting for an answer

SteffanCline commented 4 years ago

Yes. I used the default kernel first but it wouldn’t load qla2xxx with an error about invalid parameter. I then used modprobe-f and it said invalid key. Googling that said it was an unsigned module. I’ll wipe the machine and reinstall 7 again and see if the issue repeats itself.

maurizio-lombardi commented 4 years ago

I was able to load the module and made everything look like I had it in CentOS 7 which let me get much further but still whenever looking at the structure via ls in targetcli CentOS 8, it just wouldn't show up.

As I said, this is totally expected in Centos 8 because we removed the qla2xxx target mode entirely (in rtslib too). I am not sure about Centos 7 and Fedora 31, I will try to find out.

SteffanCline commented 4 years ago

I have a new CentOS 7 minimal install done with targetcli installed now. I've set it to run in target mode via echo 'options qla2xxx qlini_mode="disabled"' > /usr/lib/modprobe.d/qla2xxx.conf

Created /etc/modules-load.d/qla2xxx.conf to load the qla2xxx at load.

Rebooted to ensure everything loaded.

Noticed tcm_qla2xxx did not load. # lsmod | grep qla2xxx qla2xxx 792059 0 nvme_fc 33640 1 qla2xxx scsi_transport_fc 64007 2 bfa,qla2xxx ` # modprobe tcm_qla2xxx modprobe: ERROR: could not insert 'tcm_qla2xxx': Invalid argument # modprobe -f tcm_qla2xxx modprobe: ERROR: could not insert 'tcm_qla2xxx': Key was rejected by service`

Checked to make sure the mod is there. # ls /lib/modules/$(uname -r)/kernel/drivers/scsi/qla2xxx/tcm_qla2xxx.ko.xz /lib/modules/3.10.0-1062.9.1.el7.x86_64/kernel/drivers/scsi/qla2xxx/tcm_qla2xxx.ko.xz

What is causing this? This is what had me try that later kernel since the tcm_qla2xxx would load.

I checked configfs and it's mounted as it should be.

dmesg reveals

[ 304.018220] TECH PREVIEW: QLA2XXX Target Mode Operation may not be fully supported. Please review provided documentation for limitations. [ 304.018227] Missing tfo->check_stop_free() [ 2273.161531] TECH PREVIEW: QLA2XXX Target Mode Operation may not be fully supported. Please review provided documentation for limitations. [ 2273.161545] Missing tfo->check_stop_free()

maurizio-lombardi commented 4 years ago

[ 304.018227] Missing tfo->check_stop_free()

This is very interesting! Probably this is a bug in the RHEL kernel, let me check the code

maurizio-lombardi commented 4 years ago

Ok, I found out the possible mistake: a missing check_stop_free pointer initialization in the RHEL/Centos 7 tcm_qla2xxx kernel module. I have to ask our maintainer whether it's intentional or a it's bug due to a mistake when backporting the NPIV support patches from the mainline kernel to our RHEL7 one.

maurizio-lombardi commented 4 years ago

I am almost certain that it's a regression introduced in RHEL 7.6, I am going to open a bugzilla against RHEL7.

SteffanCline commented 4 years ago

Ok, I found out the possible mistake: a missing check_stop_free pointer initialization in the RHEL/Centos 7 tcm_qla2xxx kernel module. I have to ask our maintainer whether it's intentional or a it's bug due to a mistake when backporting the NPIV support patches from the mainline kernel to our RHEL7 one. I am almost certain that it's a regression introduced in RHEL 7.6, I am going to open a bugzilla against RHEL7.

Any idea which kernel actually works so I can downgrade or is there a repo of testing kernels their fix may be applied against so I can get this going?

This is why I went to that newer kernel elrepo since it seems that part of it was resolved. Should I try upgrading to that again so we can see what the problem is with targetcli while they research that module?

maurizio-lombardi commented 4 years ago

Ok, I found out the possible mistake: a missing check_stop_free pointer initialization in the RHEL/Centos 7 tcm_qla2xxx kernel module. I have to ask our maintainer whether it's intentional or a it's bug due to a mistake when backporting the NPIV support patches from the mainline kernel to our RHEL7 one. I am almost certain that it's a regression introduced in RHEL 7.6, I am going to open a bugzilla against RHEL7.

Any idea which kernel actually works so I can downgrade or is there a repo of testing kernels their fix may be applied against so I can get this going?

Probably it could work with older kernels (version <= kernel-3.10.0-900.el7) I am going to prepare a kernel you can easily install and test

This is why I went to that newer kernel elrepo since it seems that part of it was resolved. Should I try upgrading to that again so we can see what the problem is with targetcli while they research that module?

It's not a top priority for my team in Red Hat (we are focused on 3.10.0 kernels) but if you want to do that then go ahead, it may be interesting for the qla2xxx team.

SteffanCline commented 4 years ago

Probably it could work with older kernels (version <= kernel-3.10.0-900.el7) I am going to prepare a kernel you can easily install and test

Sounds awesome! Let me know when ready.

It's not a top priority for my team in Red Hat (we are focused on 3.10.0 kernels) but if you want to do that then go ahead, it may be interesting for the qla2xxx team.

I would think they could see what was done and implement the fix in the older kernel.

Does the rtslib and/or targetcli do any logging any where to show why it fails? The message Could not create Target in configFS is just far too vague. Is there a verbose flag I missed somewhere?

maurizio-lombardi commented 4 years ago

Does the rtslib and/or targetcli do any logging any where to show why it fails? The message Could not create Target in configFS is just far too vague. Is there a verbose flag I missed somewhere?

I think the problem is in the kernel, not in rtslib. rtslib only receives a error return code and prints an error message, it's not able to give you much details. It may be more useful to see the output of the dmesg command

SteffanCline commented 4 years ago

Ok. I’ll wait to try out your test kernel.

maurizio-lombardi commented 4 years ago

Hello, I have a test kernel for Centos 7 that fixes the module loading (check_stop_free error) https://drive.google.com/file/d/1i6ICzugi2FWJHoe-Akd33MXSeZNzuewE/view?usp=sharing

SteffanCline commented 4 years ago

# rpm -ivh kernel-3.10.0-1062.el7_tcmqla2v1.x86_64.rpm --force Preparing... ################################# [100%] Updating / installing... 1:kernel-3.10.0-1062.el7_tcmqla2v1 ################################# [100%]

Reboot...

Had to make another entry to load tcm_qla2xxx on boot since it wasn't loaded. /etc/modules-load.d/tcm_qla2xxx.conf

Ensuring everything is loaded. # lsmod | grep qla2xxx tcm_qla2xxx 35825 0 target_core_mod 342807 1 tcm_qla2xxx qla2xxx 792020 1 tcm_qla2xxx nvme_fc 33640 1 qla2xxx scsi_transport_fc 64007 3 bfa,qla2xxx,tcm_qla2xxx

Confirmed target mode # cat /sys/module/qla2xxx/parameters/qlini_mode disabled

Looking for relevant information in dmesg [ 3.381848] qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.00.00.12.07.7-k. [ 3.659987] QLogic BR-series BFA FC/FCOE SCSI driver - version: 3.2.25.1 [ 4.709877] scsi host2: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:04:00.0 driver: 3.2.25.1 [ 4.727050] scsi host3: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:04:00.1 driver: 3.2.25.1 [ 5.777637] scsi host4: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:41:00.0 driver: 3.2.25.1 [ 5.794875] scsi host5: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:41:00.1 driver: 3.2.25.1 [ 6.844623] scsi host6: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:42:00.0 driver: 3.2.25.1 [ 6.861965] scsi host7: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:42:00.1 driver: 3.2.25.1 [ 7.911620] scsi host8: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:44:00.0 driver: 3.2.25.1 [ 7.928858] scsi host9: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:44:00.1 driver: 3.2.25.1 [ 595.982611] TECH PREVIEW: QLA2XXX Target Mode Operation may not be fully supported. Please review provided documentation for limitations.

Cool. Error is gone on the module.

Trying targetcli # targetcli Warning: Could not load preferences file /root/.targetcli/prefs.bin. targetcli shell version 2.1.fb49 Copyright 2011-2013 by Datera, Inc and others. For help on commands, type 'help'. ` /> /qla2xxx info Fabric module name: qla2xxx ConfigFS path: /sys/kernel/config/target/qla2xxx Allowed WWN types: naa Allowed WWNs list: naa.10008c7cffc7ef00, naa.10008c7cffc7ef01, naa.10008c7cffc52800, naa.10008c7cffc52801, naa.10008c7cffc7dc00, naa.10008c7cffc7dc01, naa.10008c7cffc58b00, naa.10008c7cffc58b01 Fabric module features: acls Corresponding kernel module: tcm_qla2xxx /> /qla2xxx create naa.10008c7cffc7ef00 Could not create Target in configFS /> /qla2xxx create wwn=naa.10008c7cffc7ef00 Could not create Target in configFS /> exit Global pref auto_save_on_exit=true Configuration saved to /etc/target/saveconfig.json`

No new messages in dmesg and nothing in /var/log/messages either. Logs attached. var-log-messages-0102020.txt targetcli-log-01092020.txt

What's next?

maurizio-lombardi commented 4 years ago

Ok thanks for testing, I am waiting for the qla2xxx maintainer to answer my email.

SteffanCline commented 4 years ago

It’s worth mentioning that while it does not say it in any of the logs, these cards are QLogic 2662. QLogic doesn’t offer drivers for RHEL7/CentOS 7 saying that they are supported in the qla2xxx module.

maurizio-lombardi commented 4 years ago

That may be the reason why the driver refuses to create the target

SteffanCline commented 4 years ago

That may be the reason why the driver refuses to create the target

All the docs online say the qla2xxx series cards, which mine are, are supported in the latest kernels. At this point, it's as if the driver is silently failing. There's no feedback to indicate why.

Why would the driver refuse to create the target if everything says it’s supported?

Is it possible that the driver is misidentifying the cards?

I wonder if the team will know of some specific bios setting I need to apply. It’ll be interesting to hear their feedback.

https://driverdownloads.qlogic.com/QLogicDriverDownloads_UI/SearchByProduct.aspx?ProductCategory=322&Product=1212&Os=65

Screen Shot 2020-01-09 at 9 49 40 AM
SteffanCline commented 4 years ago

Per this link: https://bugzilla.redhat.com/show_bug.cgi?id=1666377

the qla2xxx's code in targetcli has been disabled.

Is that only for CentOS 8?

Been googling for a while. Everything I've seen says the cards are supported. This is brutal. No errors reported anywhere. Just silent fail.

maurizio-lombardi commented 4 years ago

Per this link: https://bugzilla.redhat.com/show_bug.cgi?id=1666377

the qla2xxx's code in targetcli has been disabled.

Is that only for CentOS 8?

Yes, for RHEL8 and Centos8. For those 2 operating systems qla2xxx target mode has been completely disabled. For RHEL7 and Centos7 it's expected to be available (but only as a technology preview, not fully supported)

Been googling for a while. Everything I've seen says the cards are supported. This is brutal. No errors reported anywhere. Just silent fail.

I guess this is a problem within the qla2xxx driver.

SteffanCline commented 4 years ago

Well, since the one issue with the kernel you sent seems fixed (yay!), do you have another module with some debug code and logging that we can test to find out what's going on?

maurizio-lombardi commented 4 years ago

Well, since the one issue with the kernel you sent seems fixed (yay!), do you have another module with some debug code and logging that we can test to find out what's going on?

I will prepare something to test

SteffanCline commented 4 years ago

Anything new from the team on the status of support for the card or a test module to see what's happening?

In waiting, I installed FreeNAS again, set the server in target mode. The CentOS 7 initiator on one of my other servers didn't activate the qla2xxx module on boot. I then enabled it with modprobe. I tried putting an entry /etc/modules-load.d/qla2xxx.conf but the kernel still doesn't load it automatically. It doesn't recognize the target either. This is so strange. I'll be calling QLogic to ask about this.

SteffanCline commented 4 years ago

QLogic has no number but I opened a case. Any feedback from the qla2xxx team as to why it's not properly recognizing the card?

SteffanCline commented 4 years ago

Swapped out those cards. Put in QLE2564 which work with FreeNAS fine so I'm sure they're good with CentOS 7.

This is new...

/> /qla2xxx create naa.21000024ff67440c Traceback (most recent call last): File "/usr/bin/targetcli", line 122, in <module> main() File "/usr/bin/targetcli", line 112, in main shell.run_interactive() File "/usr/lib/python2.7/site-packages/configshell_fb/shell.py", line 905, in run_interactive self._cli_loop() File "/usr/lib/python2.7/site-packages/configshell_fb/shell.py", line 734, in _cli_loop self.run_cmdline(cmdline) File "/usr/lib/python2.7/site-packages/configshell_fb/shell.py", line 848, in run_cmdline self._execute_command(path, command, pparams, kparams) File "/usr/lib/python2.7/site-packages/configshell_fb/shell.py", line 823, in _execute_command result = target.execute_command(command, pparams, kparams) File "/usr/lib/python2.7/site-packages/configshell_fb/node.py", line 1406, in execute_command return method(*pparams, **kparams) File "/usr/lib/python2.7/site-packages/targetcli/ui_target.py", line 195, in ui_command_create ui_target = UITarget(target, self) File "/usr/lib/python2.7/site-packages/targetcli/ui_target.py", line 555, in __init__ self.rtsnode.enable = True File "/usr/lib/python2.7/site-packages/rtslib_fb/target.py", line 241, in _set_enable if os.path.isfile(path) and (boolean != self._get_enable()): File "/usr/lib/python2.7/site-packages/rtslib_fb/target.py", line 230, in _get_enable return bool(int(fread(path))) File "/usr/lib/python2.7/site-packages/rtslib_fb/utils.py", line 100, in fread with open(path, 'r') as file_fd: IOError: [Errno 13] Permission denied: '/sys/kernel/config/target/qla2xxx/21:00:00:24:ff:67:44:0c/tpgt_1/enable' [root@localhost ~]# targetcli [Errno 13] Permission denied: '/sys/kernel/config/target/qla2xxx/21:00:00:24:ff:67:44:0c/tpgt_1/enable'

SteffanCline commented 4 years ago

Couldn't relaunch targetcli. Rebooted server, tried again and same error.

Suggestions?

maurizio-lombardi commented 4 years ago

[Errno 13] Permission denied: '/sys/kernel/config/target/qla2xxx/21:00:00:24:ff:67:44:0c/tpgt_1/enable'

Hmmm, this not a bug in targetcli. the qla2xxx module refuses to enable the target mode.

I am going to ping again the qla2xxx team

SteffanCline commented 4 years ago

Ok. Did they ever respond the previous time?

While waiting on them, I plan to use CLVM to manage writes to the SAN. Is it possible to mount the target block device on the target server, aside from the initiators (for performing maintenance etc)?

maurizio-lombardi commented 4 years ago

Ok. Did they ever respond the previous time?

Yes, the guy I contacted told me he was going to do a test but he never got back with the results.

While waiting on them, I plan to use CLVM to manage writes to the SAN. Is it possible to mount the target block device on the target server, aside from the initiators (for performing maintenance etc)?

It's possible, but you have to be sure that the filesystem is not mounted by an initiator, otherwise you will end up corrupting the filesystem.

SteffanCline commented 4 years ago

It's possible, but you have to be sure that the filesystem is not mounted by an initiator, otherwise you will end up corrupting the filesystem.

If the target server is also in the same pool for CLVM, that should not be an issue. When I was originally testing on CentOS 7, the kernel/targetcli complained when I had the volume mounted. I had to unmount it just to get as far as I did for you to see that same error repeatedly.

I can tell you that whatever changes done to the (tcm_) qla2xxx drivers in Fedora work correctly with the qle2564 FC cards and targetcli so the issue seems isolated now to CentOS. I tested this just to see how it worked on Fedora. Is there a way in targetcli to be more generic like wildcards for the FC ports and ACLs? I'm stuck at the ACLs because I don't yet know all the addresses of the initiators. FreeNAS seems to do this automatically somehow. On that platform once set up, it does all ports when set up and regardless of what initiator is plugged into which port, it just works. Can this be done somehow for target or ACLs through targetcli?

Here's what I have so far in testing Fedora while waiting on your team for the CentOS issue.

Screen Shot 2020-01-30 at 7 57 27 AM
SteffanCline commented 4 years ago

In waiting, I wiped the server clean and installed FreeNAS to determine if the hardware was fine. The block device shows up so I know the hardware is fine. I'll reinstall the old config but surely there's something missing in the config. No other ideas?

SteffanCline commented 4 years ago

OS Reinstalled

  1. Fresh install
  2. Did a yum update and then installed targetcli
  3. Put server into target mode
  4. Disabled SELinux
  5. Did a dracut -f and rebooted
  6. Verified in target mode
  7. systemctl enable target; systemctl start target

In targetcli

  1. Created backstore - /backstores/block create dev=/dev/sda name=sda
  2. Created target - used the one with link up /qla2xxx create naa.21000024ff67440f
  3. Created luns create /backstores/block/sda
  4. Created acls create naa.21000024ff56a8a8
  5. Exit.

Went to initiator

  1. ls -l /dev/disk/by-path/ not there
  2. lsblk not there.

Am I missing any steps?

Screen Shot 2020-02-10 at 8 03 05 AM
maurizio-lombardi commented 4 years ago

Seems ok, I suppose the FC switch is configured correctly.

Is there any error message in dmesg?

SteffanCline commented 4 years ago

Connected p2p

Screen Shot 2020-02-10 at 9 10 15 AM