osandov / blktests

Linux kernel block layer testing framework
113 stars 74 forks source link

srp/** tests fails on power with error scsi_transport_srp is in use #130

Open disgoel opened 6 months ago

disgoel commented 6 months ago

The srp/** tests fails on power with below error.

./check tests/srp/001

srp/001 (Create and remove LUNs) [failed] runtime 0.125s ... 0.132s --- tests/srp/001.out 2023-12-25 09:40:30.000000000 +0530 +++ /home/blktests-master/results/nodev/srp/001.out.bad 2024-01-01 15:26:32.804759400 +0530 @@ -1,3 +1,4 @@ -Configured SRP target driver -count_luns(): 3 <> 3 -Passed +tests/srp/rc: line 263: /sys/class/srp_remote_ports/port-0:1/delete: Permission denied +tests/srp/rc: line 263: /sys/class/srp_remote_ports/port-0:1/delete: Permission denied +modprobe: FATAL: Module scsi_transport_srp is in use. +failed to shutdown client tests/srp/rc: line 263: /sys/class/srp_remote_ports/port-0:1/delete: Permission denied tests/srp/rc: line 263: /sys/class/srp_remote_ports/port-0:1/delete: Permission denied modprobe: FATAL: Module scsi_transport_srp is in use.

cat /home/blktests/results/nodev/srp/001.out.bad

tests/srp/rc: line 263: /sys/class/srp_remote_ports/port-0:1/delete: Permission denied tests/srp/rc: line 263: /sys/class/srp_remote_ports/port-0:1/delete: Permission denied modprobe: FATAL: Module scsi_transport_srp is in use. failed to shutdown client

bvanassche commented 6 months ago

What is the kernel version that triggered this failure (uname -a)?

disgoel commented 6 months ago

What is the kernel version that triggered this failure (uname -a)?

uname -r 5.14.0-402.el9.ppc64le

bvanassche commented 6 months ago

That's not the version number of an upstream kernel. Please report issues encountered with Red Hat kernels to Red Hat.

kawasaki commented 6 months ago

It's interesting to see the blktests results on PowerPC architecture :)

Unfortunately, the kernel version 5.14.0 that RedHat chose for RHEL9 is not LTS kernel, so it will not be productive to put debug effort with the kernel. I'm interested in if the failure is still observed with the latest v6.7 kernel (or LTS kernels) on the Power PC system. @disgoel , is it possible to build v6.7 kernel and install on the system? If srp tests still fails, then there are unknown issues. If srp tests passes, then the old kernel is the issue.

I tried the kernel v5.14.21 on Fedora 39 on my test system and observed srp group test cases all passed for SIW driver. So the failures reported could be PowerPC unique, but this is just a guess at this moment.

yizhanglinux commented 4 months ago

There is no delete sysfs under /sys/class/srp_remote_ports/port-0\:1/ on ppc64le, and module srp_remote_ports was used by ibmvscsi which cannot be removed, we can use the workaround [2] to fix it.

[1]

# uname -r
6.8.0-0.rc3.26.fc40.ppc64le

# ls /sys/class/srp_remote_ports/port-0\:1/
device  port_id  power  roles  subsystem  uevent

# lsmod | grep scsi_transport_srp
scsi_transport_srp    262144  1 ibmvscsi

# modinfo ibmvscsi
filename:       /lib/modules/6.8.0-0.rc3.26.fc40.ppc64le/kernel/drivers/scsi/ibmvscsi/ibmvscsi.ko.xz
version:        1.5.9
license:        GPL
author:         Dave Boutcher
description:    IBM Virtual SCSI
rhelversion:    9.99
srcversion:     C68F72BE86AEC8C2E06395A
alias:          vio:TvscsiSIBM,v-scsi*
depends:        scsi_transport_srp
intree:         Y
name:           ibmvscsi
vermagic:       6.8.0-0.rc3.26.fc40.ppc64le SMP mod_unload patchable-function-entry relocatable
sig_id:         PKCS#7
signer:         Fedora kernel signing key
sig_key:        62:1F:EA:38:E0:AD:BC:8A:52:4A:27:EC:8E:35:C1:2F:55:43:96:2A
sig_hashalgo:   sha256
signature:      31:87:D9:A2:E8:C6:70:FD:AD:57:E9:97:BE:E9:F5:11:19:B6:D5:D1:
        7A:60:04:46:48:9B:15:C1:A1:11:6F:AE:F9:4E:F9:51:6B:3A:F4:47:
        DD:26:A8:46:22:84:25:73:62:FA:1C:2E:4D:5D:04:10:9E:81:E9:F5:
        5E:0A:15:A8:D5:37:0F:8A:0E:0C:00:AC:61:FF:33:61:A5:9A:86:59:
        C3:01:48:97:13:51:B2:14:6E:0B:87:8F:B1:FC:AF:8F:A4:FA:1B:B0:
        8F:33:05:A4:BD:B1:1D:95:5A:07:1A:8D:53:D0:6D:30:35:99:77:44:
        73:58:CD:38:43:20:1F:2B:B2:42:4F:67:50:25:2C:FA:0E:FC:98:64:
        DF:46:67:DB:98:F2:7D:8D:F3:F1:A9:F4:AC:BB:4E:DB:D1:EB:A4:0E:
        6F:66:6E:7A:8D:66:02:99:26:9E:07:84:09:AB:D7:0F:05:FE:75:A5:
        4D:D1:1D:F1:0E:C5:8B:C7:48:FF:BE:B0:C3:02:82:00:50:DD:6C:AC:
        83:F5:44:97:29:7E:28:23:AE:A0:45:7B:B8:0F:AB:90:95:60:F9:01:
        2F:2B:CB:BB:65:AD:45:55:8E:9B:AD:39:50:73:5F:79:E3:9D:0B:2D:
        96:FE:E3:F4:5E:B1:C1:5B:DA:3E:AF:40:94:4E:14:51:AA:8F:BF:6D:
        30:23:23:DD:70:CB:7C:3B:A0:26:66:DF:51:EB:3D:C0:FF:BD:D8:B8:
        4C:2A:EC:E7:82:01:BD:22:5C:1E:57:5D:1C:F7:FD:8B:BD:01:0E:7D:
        8A:1F:74:9A:C5:FA:78:79:FA:80:38:5E:5D:6F:0A:75:E7:47:BD:C3:
        3C:9C:9C:D0:72:AC:5C:C1:29:D8:98:0F:F0:8A:7A:FB:76:3F:C1:72:
        C1:0D:C4:ED:97:B1:83:88:AE:BA:3E:9E:D8:C5:0C:3D:12:FE:21:3E:
        93:6C:83:13:59:D9:E9:25:72:6D:F7:0C:59:73:7D:B7:4E:3B:9F:73:
        94:22:2B:D5:6C:B7:32:08:54:AB:C9:57:2A:C6:8D:6A:88:71:94:9B:
        A3:9B:A6:E7:6D:27:B0:BD:D9:6B:60:F3:AE:3A:CF:BE:EF:CF:39:64:
        87:06:9D:85:95:24:A3:0E:66:59:36:42:1D:2E:17:11:A4:5E:E9:0F:
        17:BF:2D:62:E5:F5:EA:7A:15:3B:A2:16:FF:37:DA:B1:DF:FB:47:8E:
        6A:07:5F:46:9A:AD:60:C3:07:0D:0C:5D:76:65:E2:BC:CA:61:24:20:
        B9:7B:68:2F:14:FF:B0:EA:79:4C:09:80:EE:69:04:45:84:3C:88:53:
        8E:15:B9:E8:29:7D:FC:95:60:4C:68:31
parm:           max_id:Largest ID value for each channel [Default=64] (int)
parm:           max_channel:Largest channel value [Default=3] (int)
parm:           init_timeout:Initialization timeout in seconds (int)
parm:           max_requests:Maximum requests for this adapter (int)
parm:           fast_fail:Enable fast fail. [Default=1] (int)
parm:           client_reserve:Attempt client managed reserve/release (int)

[2] https://github.com/yizhanglinux/blktests/commit/651a9d9174630ac87492c97e89c1d57d5474cedd

disgoel commented 4 months ago

Thanks for the fix Yi Zhang. I ran the srp tests after applying your patch but still tests failed with below error.

# ./check srp/001
srp/001 (Create and remove LUNs)                             [failed]
    runtime  4.785s  ...  4.818s
    --- tests/srp/001.out   2024-03-07 16:49:16.170133366 +0530
    +++ /home/blktests/results/nodev/srp/001.out.bad    2024-03-08 16:53:55.160852461 +0530
    @@ -1,3 +1,3 @@
    +common/multipath-over-rdma: line 411: bonding_masters/addr_len: Not a directory
     Configured SRP target driver
    -count_luns(): 3 <> 3
    -Passed
    +SRP login failed

# cat /home/blktests/results/nodev/srp/001.out.bad
common/multipath-over-rdma: line 411: bonding_masters/addr_len: Not a directory
Configured SRP target driver
SRP login failed
yizhanglinux commented 4 months ago

Thanks for the fix Yi Zhang. I ran the srp tests after applying your patch but still tests failed with below error.

# ./check srp/001
srp/001 (Create and remove LUNs)                             [failed]
    runtime  4.785s  ...  4.818s
    --- tests/srp/001.out 2024-03-07 16:49:16.170133366 +0530
    +++ /home/blktests/results/nodev/srp/001.out.bad  2024-03-08 16:53:55.160852461 +0530
    @@ -1,3 +1,3 @@
    +common/multipath-over-rdma: line 411: bonding_masters/addr_len: Not a directory
     Configured SRP target driver
    -count_luns(): 3 <> 3
    -Passed
    +SRP login failed

# cat /home/blktests/results/nodev/srp/001.out.bad
common/multipath-over-rdma: line 411: bonding_masters/addr_len: Not a directory
Configured SRP target driver
SRP login failed

Please also add this patch: https://github.com/yizhanglinux/blktests/commit/55b0193300d9e5777514d84fb908bca5e43066ba

disgoel commented 4 months ago

I get this with both the patches applied.

# ./check srp/001
srp/001 (Create and remove LUNs)                             [failed]
    runtime  4.678s  ...  4.777s
    --- tests/srp/001.out   2024-03-07 16:49:16.170133366 +0530
    +++ /home/blktests/results/nodev/srp/001.out.bad    2024-03-08 17:09:22.183667901 +0530
    @@ -1,3 +1,2 @@
     Configured SRP target driver
    -count_luns(): 3 <> 3
    -Passed
    +SRP login failed

# cat /home/blktests/results/nodev/srp/001.out.bad
Configured SRP target driver
SRP login failed