Open SteffanCline opened 4 years ago
In waiting, I tried CentOS 8 which was an even bigger bust. I wiped that clean and tried again with Fedora 31. Same darn error "Could not create Target in configFS". Anyone??
I figure the type of HBA would be good to know. They're a Cavium QLogic BR-1860 QLE2662 which is supported from all I've read. Here are some details.
# systool -c fc_host -v
Class = "fc_host"
`
Class Device = "host2"
Class Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/host2/fc_host/host2"
active_fc4s = "0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 "
dev_loss_tmo = "60"
fabric_name = "0x0"
issue_lip =
max_npiv_vports = "255"
maxframe_size = "0 bytes"
node_name = "0x20008c7cffc7ef00"
npiv_vports_inuse = "0"
port_id = "0x000000"
port_name = "0x10008c7cffc7ef00"
port_state = "Linkdown"
port_type = "Unknown"
speed = "unknown"
supported_classes = "Class 3"
supported_fc4s = "0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 "
supported_speeds = "2 Gbit, 4 Gbit, 8 Gbit, 16 Gbit"
symbolic_name = "QLogic-1860-2p | 3.2.25.1 | | | "
tgtid_bind_type = "wwpn (World Wide Port Name)"
uevent =
vport_create =
vport_delete =
Device = "host2"
Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/host2"
uevent = "DEVTYPE=scsi_host"`
In waiting, I tried CentOS 8 which was an even bigger bust. I wiped that clean and tried again with Fedora 31. Same darn error "Could not create Target in configFS". Anyone??
I am going to investigate this.
In waiting, I tried CentOS 8 which was an even bigger bust.
Note: it's not going to work in CentOS 8 because the Red Hat management decided to disable the qla2xxx target kernel module.
If I remember correctly, there were concerns about the stability of that code. https://bugzilla.redhat.com/show_bug.cgi?id=1666377
I was able to load the module and made everything look like I had it in CentOS 7 which let me get much further but still whenever looking at the structure via ls
in targetcli CentOS 8, it just wouldn't show up.
Any idea about the logging of that error so I can trace down the issue?
FWIW, it's not working in CentOS 7, CentOS 8 or even Fedora 31. Actually 7 and 31 behave exactly the same.
Also, @maurizio-lombardi I see you are a maintainer. I REALLY appreciate your help. I have HOURS into this and getting nowhere like this is literally driving me mad.
Kernel:
4.4.207-1.el7.elrepo.x86_64
Did you also test the default 3.10.0-* Centos7 kernel ?
Also, @maurizio-lombardi I see you are a maintainer. I REALLY appreciate your help. I have HOURS into this and getting nowhere like this is literally driving me mad.
Thanks :) I sent an email to one of my colleagues that works with qla2xxx and I am waiting for an answer
Yes. I used the default kernel first but it wouldn’t load qla2xxx with an error about invalid parameter. I then used modprobe-f and it said invalid key. Googling that said it was an unsigned module. I’ll wipe the machine and reinstall 7 again and see if the issue repeats itself.
I was able to load the module and made everything look like I had it in CentOS 7 which let me get much further but still whenever looking at the structure via
ls
in targetcli CentOS 8, it just wouldn't show up.
As I said, this is totally expected in Centos 8 because we removed the qla2xxx target mode entirely (in rtslib too). I am not sure about Centos 7 and Fedora 31, I will try to find out.
I have a new CentOS 7 minimal install done with targetcli installed now.
I've set it to run in target mode via
echo 'options qla2xxx qlini_mode="disabled"' > /usr/lib/modprobe.d/qla2xxx.conf
Created /etc/modules-load.d/qla2xxx.conf
to load the qla2xxx at load.
Rebooted to ensure everything loaded.
Noticed tcm_qla2xxx did not load.
# lsmod | grep qla2xxx
qla2xxx 792059 0
nvme_fc 33640 1 qla2xxx
scsi_transport_fc 64007 2 bfa,qla2xxx
`
# modprobe tcm_qla2xxx
modprobe: ERROR: could not insert 'tcm_qla2xxx': Invalid argument
# modprobe -f tcm_qla2xxx
modprobe: ERROR: could not insert 'tcm_qla2xxx': Key was rejected by service`
Checked to make sure the mod is there.
# ls /lib/modules/$(uname -r)/kernel/drivers/scsi/qla2xxx/tcm_qla2xxx.ko.xz
/lib/modules/3.10.0-1062.9.1.el7.x86_64/kernel/drivers/scsi/qla2xxx/tcm_qla2xxx.ko.xz
What is causing this? This is what had me try that later kernel since the tcm_qla2xxx would load.
I checked configfs and it's mounted as it should be.
dmesg reveals
[ 304.018220] TECH PREVIEW: QLA2XXX Target Mode Operation may not be fully supported. Please review provided documentation for limitations.
[ 304.018227] Missing tfo->check_stop_free()
[ 2273.161531] TECH PREVIEW: QLA2XXX Target Mode Operation may not be fully supported. Please review provided documentation for limitations.
[ 2273.161545] Missing tfo->check_stop_free()
[ 304.018227] Missing tfo->check_stop_free()
This is very interesting! Probably this is a bug in the RHEL kernel, let me check the code
Ok, I found out the possible mistake: a missing check_stop_free pointer initialization in the RHEL/Centos 7 tcm_qla2xxx kernel module. I have to ask our maintainer whether it's intentional or a it's bug due to a mistake when backporting the NPIV support patches from the mainline kernel to our RHEL7 one.
I am almost certain that it's a regression introduced in RHEL 7.6, I am going to open a bugzilla against RHEL7.
Ok, I found out the possible mistake: a missing check_stop_free pointer initialization in the RHEL/Centos 7 tcm_qla2xxx kernel module. I have to ask our maintainer whether it's intentional or a it's bug due to a mistake when backporting the NPIV support patches from the mainline kernel to our RHEL7 one. I am almost certain that it's a regression introduced in RHEL 7.6, I am going to open a bugzilla against RHEL7.
Any idea which kernel actually works so I can downgrade or is there a repo of testing kernels their fix may be applied against so I can get this going?
This is why I went to that newer kernel elrepo since it seems that part of it was resolved. Should I try upgrading to that again so we can see what the problem is with targetcli while they research that module?
Ok, I found out the possible mistake: a missing check_stop_free pointer initialization in the RHEL/Centos 7 tcm_qla2xxx kernel module. I have to ask our maintainer whether it's intentional or a it's bug due to a mistake when backporting the NPIV support patches from the mainline kernel to our RHEL7 one. I am almost certain that it's a regression introduced in RHEL 7.6, I am going to open a bugzilla against RHEL7.
Any idea which kernel actually works so I can downgrade or is there a repo of testing kernels their fix may be applied against so I can get this going?
Probably it could work with older kernels (version <= kernel-3.10.0-900.el7) I am going to prepare a kernel you can easily install and test
This is why I went to that newer kernel elrepo since it seems that part of it was resolved. Should I try upgrading to that again so we can see what the problem is with targetcli while they research that module?
It's not a top priority for my team in Red Hat (we are focused on 3.10.0 kernels) but if you want to do that then go ahead, it may be interesting for the qla2xxx team.
Probably it could work with older kernels (version <= kernel-3.10.0-900.el7) I am going to prepare a kernel you can easily install and test
Sounds awesome! Let me know when ready.
It's not a top priority for my team in Red Hat (we are focused on 3.10.0 kernels) but if you want to do that then go ahead, it may be interesting for the qla2xxx team.
I would think they could see what was done and implement the fix in the older kernel.
Does the rtslib and/or targetcli do any logging any where to show why it fails? The message Could not create Target in configFS
is just far too vague. Is there a verbose flag I missed somewhere?
Does the rtslib and/or targetcli do any logging any where to show why it fails? The message
Could not create Target in configFS
is just far too vague. Is there a verbose flag I missed somewhere?
I think the problem is in the kernel, not in rtslib. rtslib only receives a error return code and prints an error message, it's not able to give you much details. It may be more useful to see the output of the dmesg command
Ok. I’ll wait to try out your test kernel.
Hello, I have a test kernel for Centos 7 that fixes the module loading (check_stop_free error) https://drive.google.com/file/d/1i6ICzugi2FWJHoe-Akd33MXSeZNzuewE/view?usp=sharing
# rpm -ivh kernel-3.10.0-1062.el7_tcmqla2v1.x86_64.rpm --force
Preparing... ################################# [100%]
Updating / installing...
1:kernel-3.10.0-1062.el7_tcmqla2v1 ################################# [100%]
Reboot...
Had to make another entry to load tcm_qla2xxx on boot since it wasn't loaded.
/etc/modules-load.d/tcm_qla2xxx.conf
Ensuring everything is loaded.
# lsmod | grep qla2xxx
tcm_qla2xxx 35825 0
target_core_mod 342807 1 tcm_qla2xxx
qla2xxx 792020 1 tcm_qla2xxx
nvme_fc 33640 1 qla2xxx
scsi_transport_fc 64007 3 bfa,qla2xxx,tcm_qla2xxx
Confirmed target mode
# cat /sys/module/qla2xxx/parameters/qlini_mode
disabled
Looking for relevant information in dmesg
[ 3.381848] qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.00.00.12.07.7-k.
[ 3.659987] QLogic BR-series BFA FC/FCOE SCSI driver - version: 3.2.25.1
[ 4.709877] scsi host2: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:04:00.0 driver: 3.2.25.1
[ 4.727050] scsi host3: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:04:00.1 driver: 3.2.25.1
[ 5.777637] scsi host4: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:41:00.0 driver: 3.2.25.1
[ 5.794875] scsi host5: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:41:00.1 driver: 3.2.25.1
[ 6.844623] scsi host6: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:42:00.0 driver: 3.2.25.1
[ 6.861965] scsi host7: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:42:00.1 driver: 3.2.25.1
[ 7.911620] scsi host8: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:44:00.0 driver: 3.2.25.1
[ 7.928858] scsi host9: QLogic BR-series FC/FCOE Adapter, hwpath: 0000:44:00.1 driver: 3.2.25.1
[ 595.982611] TECH PREVIEW: QLA2XXX Target Mode Operation may not be fully supported.
Please review provided documentation for limitations.
Cool. Error is gone on the module.
Trying targetcli
# targetcli
Warning: Could not load preferences file /root/.targetcli/prefs.bin.
targetcli shell version 2.1.fb49
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.
`
/> /qla2xxx info
Fabric module name: qla2xxx
ConfigFS path: /sys/kernel/config/target/qla2xxx
Allowed WWN types: naa
Allowed WWNs list: naa.10008c7cffc7ef00, naa.10008c7cffc7ef01, naa.10008c7cffc52800, naa.10008c7cffc52801, naa.10008c7cffc7dc00, naa.10008c7cffc7dc01, naa.10008c7cffc58b00, naa.10008c7cffc58b01
Fabric module features: acls
Corresponding kernel module: tcm_qla2xxx
/> /qla2xxx create naa.10008c7cffc7ef00
Could not create Target in configFS
/> /qla2xxx create wwn=naa.10008c7cffc7ef00
Could not create Target in configFS
/> exit
Global pref auto_save_on_exit=true
Configuration saved to /etc/target/saveconfig.json`
No new messages in dmesg and nothing in /var/log/messages either. Logs attached. var-log-messages-0102020.txt targetcli-log-01092020.txt
What's next?
Ok thanks for testing, I am waiting for the qla2xxx maintainer to answer my email.
It’s worth mentioning that while it does not say it in any of the logs, these cards are QLogic 2662. QLogic doesn’t offer drivers for RHEL7/CentOS 7 saying that they are supported in the qla2xxx module.
That may be the reason why the driver refuses to create the target
That may be the reason why the driver refuses to create the target
All the docs online say the qla2xxx series cards, which mine are, are supported in the latest kernels. At this point, it's as if the driver is silently failing. There's no feedback to indicate why.
Why would the driver refuse to create the target if everything says it’s supported?
Is it possible that the driver is misidentifying the cards?
I wonder if the team will know of some specific bios setting I need to apply. It’ll be interesting to hear their feedback.
Per this link: https://bugzilla.redhat.com/show_bug.cgi?id=1666377
the qla2xxx's code in targetcli has been disabled.
Is that only for CentOS 8?
Been googling for a while. Everything I've seen says the cards are supported. This is brutal. No errors reported anywhere. Just silent fail.
Per this link: https://bugzilla.redhat.com/show_bug.cgi?id=1666377
the qla2xxx's code in targetcli has been disabled.
Is that only for CentOS 8?
Yes, for RHEL8 and Centos8. For those 2 operating systems qla2xxx target mode has been completely disabled. For RHEL7 and Centos7 it's expected to be available (but only as a technology preview, not fully supported)
Been googling for a while. Everything I've seen says the cards are supported. This is brutal. No errors reported anywhere. Just silent fail.
I guess this is a problem within the qla2xxx driver.
Well, since the one issue with the kernel you sent seems fixed (yay!), do you have another module with some debug code and logging that we can test to find out what's going on?
Well, since the one issue with the kernel you sent seems fixed (yay!), do you have another module with some debug code and logging that we can test to find out what's going on?
I will prepare something to test
Anything new from the team on the status of support for the card or a test module to see what's happening?
In waiting, I installed FreeNAS again, set the server in target mode. The CentOS 7 initiator on one of my other servers didn't activate the qla2xxx module on boot. I then enabled it with modprobe. I tried putting an entry /etc/modules-load.d/qla2xxx.conf but the kernel still doesn't load it automatically. It doesn't recognize the target either. This is so strange. I'll be calling QLogic to ask about this.
QLogic has no number but I opened a case. Any feedback from the qla2xxx team as to why it's not properly recognizing the card?
Swapped out those cards. Put in QLE2564 which work with FreeNAS fine so I'm sure they're good with CentOS 7.
This is new...
/> /qla2xxx create naa.21000024ff67440c
Traceback (most recent call last):
File "/usr/bin/targetcli", line 122, in <module>
main()
File "/usr/bin/targetcli", line 112, in main
shell.run_interactive()
File "/usr/lib/python2.7/site-packages/configshell_fb/shell.py", line 905, in run_interactive
self._cli_loop()
File "/usr/lib/python2.7/site-packages/configshell_fb/shell.py", line 734, in _cli_loop
self.run_cmdline(cmdline)
File "/usr/lib/python2.7/site-packages/configshell_fb/shell.py", line 848, in run_cmdline
self._execute_command(path, command, pparams, kparams)
File "/usr/lib/python2.7/site-packages/configshell_fb/shell.py", line 823, in _execute_command
result = target.execute_command(command, pparams, kparams)
File "/usr/lib/python2.7/site-packages/configshell_fb/node.py", line 1406, in execute_command
return method(*pparams, **kparams)
File "/usr/lib/python2.7/site-packages/targetcli/ui_target.py", line 195, in ui_command_create
ui_target = UITarget(target, self)
File "/usr/lib/python2.7/site-packages/targetcli/ui_target.py", line 555, in __init__
self.rtsnode.enable = True
File "/usr/lib/python2.7/site-packages/rtslib_fb/target.py", line 241, in _set_enable
if os.path.isfile(path) and (boolean != self._get_enable()):
File "/usr/lib/python2.7/site-packages/rtslib_fb/target.py", line 230, in _get_enable
return bool(int(fread(path)))
File "/usr/lib/python2.7/site-packages/rtslib_fb/utils.py", line 100, in fread
with open(path, 'r') as file_fd:
IOError: [Errno 13] Permission denied:
'/sys/kernel/config/target/qla2xxx/21:00:00:24:ff:67:44:0c/tpgt_1/enable'
[root@localhost ~]# targetcli
[Errno 13] Permission denied:
'/sys/kernel/config/target/qla2xxx/21:00:00:24:ff:67:44:0c/tpgt_1/enable'
Couldn't relaunch targetcli. Rebooted server, tried again and same error.
Suggestions?
[Errno 13] Permission denied: '/sys/kernel/config/target/qla2xxx/21:00:00:24:ff:67:44:0c/tpgt_1/enable'
Hmmm, this not a bug in targetcli. the qla2xxx module refuses to enable the target mode.
I am going to ping again the qla2xxx team
Ok. Did they ever respond the previous time?
While waiting on them, I plan to use CLVM to manage writes to the SAN. Is it possible to mount the target block device on the target server, aside from the initiators (for performing maintenance etc)?
Ok. Did they ever respond the previous time?
Yes, the guy I contacted told me he was going to do a test but he never got back with the results.
While waiting on them, I plan to use CLVM to manage writes to the SAN. Is it possible to mount the target block device on the target server, aside from the initiators (for performing maintenance etc)?
It's possible, but you have to be sure that the filesystem is not mounted by an initiator, otherwise you will end up corrupting the filesystem.
It's possible, but you have to be sure that the filesystem is not mounted by an initiator, otherwise you will end up corrupting the filesystem.
If the target server is also in the same pool for CLVM, that should not be an issue. When I was originally testing on CentOS 7, the kernel/targetcli complained when I had the volume mounted. I had to unmount it just to get as far as I did for you to see that same error repeatedly.
I can tell you that whatever changes done to the (tcm_) qla2xxx drivers in Fedora work correctly with the qle2564 FC cards and targetcli so the issue seems isolated now to CentOS. I tested this just to see how it worked on Fedora. Is there a way in targetcli to be more generic like wildcards for the FC ports and ACLs? I'm stuck at the ACLs because I don't yet know all the addresses of the initiators. FreeNAS seems to do this automatically somehow. On that platform once set up, it does all ports when set up and regardless of what initiator is plugged into which port, it just works. Can this be done somehow for target or ACLs through targetcli?
Here's what I have so far in testing Fedora while waiting on your team for the CentOS issue.
In waiting, I wiped the server clean and installed FreeNAS to determine if the hardware was fine. The block device shows up so I know the hardware is fine. I'll reinstall the old config but surely there's something missing in the config. No other ideas?
OS Reinstalled
systemctl enable target; systemctl start target
In targetcli
/backstores/block create dev=/dev/sda name=sda
/qla2xxx create naa.21000024ff67440f
create /backstores/block/sda
create naa.21000024ff56a8a8
Went to initiator
ls -l /dev/disk/by-path/
not therelsblk
not there.Am I missing any steps?
Seems ok, I suppose the FC switch is configured correctly.
Is there any error message in dmesg?
Connected p2p
I've googled this particular issue and not really finding any good results on how to resolve it.
I installed targetcli via yum on CentOS 7 and this is the version in the repo:
# targetcli
targetcli shell version 2.1.fb49
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.
I downloaded the .zip of the targetcli-fb version and tried it too from the scripts folder and same results.
Python:
# python --version
Python 2.7.5
Libs:
Installed Packages
python-configshell.noarch 1:1.1.fb25-1.el7 @base
python-rtslib.noarch 2.1.fb69-3.el7 @base
python-rtslib-doc.noarch 2.1.fb69-3.el7 @base
The error comes up when using create.
/> /qla2xxx create naa.xxxxxxxxxxxxxxx
Could not create Target in configFS
Below is the info on my FC
/> qla2xxx/ info
Fabric module name: qla2xxx
ConfigFS path: /sys/kernel/config/target/qla2xxx
Allowed WWN types: naa
Allowed WWNs list: naa.xxxxxxxxxxxxxxx, naa.xxxxxxxxxxxxxxx, naa.xxxxxxxxxxxxxxx, naa.xxxxxxxxxxxxxxx, naa.xxxxxxxxxxxxxxx, naa.xxxxxxxxxxxxxxx, naa.xxxxxxxxxxxxxxx, naa.xxxxxxxxxxxxxxx
Fabric module features: acls
Corresponding kernel module: tcm_qla2xxx
I have attached the log but don't see anything that really stands out. log.txt
I checked per the link http://www.linux-iscsi.org/wiki/Fibre_Channel#Enable_target_mode to ensure that I have initiator mode disabled
# cat /sys/module/qla2xxx/parameters/qlini_mode
disabled
The kernel modules are loaded.
# lsmod | grep qla2xxx
tcm_qla2xxx 32768 1
target_core_mod 335872 4 tcm_qla2xxx,iscsi_target_mod
qla2xxx 634880 1 tcm_qla2xxx
scsi_transport_fc 65536 3 bfa,qla2xxx,tcm_qla2xxx
configfs is mounted:
# cat /proc/mounts | grep configfs
configfs /sys/kernel/config configfs rw,relatime 0 0
In the kernel config:
CONFIG_CONFIGFS_FS=y
Kernel:
4.4.207-1.el7.elrepo.x86_64
Should I be able to create a directory in /sys/kernel/config/target/qla2xxxx?
# mkdir naa.10008c7cffc7ef01
mkdir: cannot create directory ‘naa.xxxxxxxxxxxxxxx’: Invalid argument
I did notice that when I rebooted, there was no qla2xxx folder in /sys/kernel/config/target/ but after trying to use targetcli, it did make one so I'm not sure what else would cause it to fail.
Suggestions?