ZFS causes hba reset during kernel startup

ozkangoksu commented 1 year ago

System information

Type	Version/Name
Distribution Name	Truenas Scale, Debian
Distribution Version	22.12.3.2
Kernel Version	Linux v5.15.107
Architecture	x64
OpenZFS Version	2.1.11

MY storage server: UCS S3260, Firmware version: 4.2(3a) (Secure) HBA card is internal built-in default card and the card is: UCS S3260 Dual Pass Through Controller based on Broadcom 3316 ROC mpt3sas_cm0: LSISAS3316: FWVersion(13.00.08.00), ChipRevision(0x01), BiosVersion(15.00.06.00)

Describe the problem you're observing

On a storage server I'm using truenas scale and ZFS. When a new pool created, everything works fine but when I reboot the server, during first startup sequence, ZFS kernel module generates problems under race condition. This problem only happens during boot startup sequence and I narrowed the issue between iscsi, sd, mpt3sas, zfs kernel modules. These are our main actors and issue reason is inharmonious and incompatible tasks between them.

If I disable zfs auto-import, everything works. So the trigger is ZFS for sure. After startup sequence "with no-auto-import" I can import the ZFS pool manually and use without any issue so this means, iscsi or sd does not like something happens during startup sequence.

I've carefully read the dmesg and divided into 3 parts to narrow the issue. (full dmesg output at the end) Please Let me share the logs and my findings.

Include any warning/errors/backtraces from the system logs

At boot sequence, the first drive related logs starts like this:

Part 1: https://gist.github.com/ozkangoksu/46b20cc6566d0792d8476f76ae21a37f

After this point, kernel module sd_mod -> "SCSI disk support" starts checking the drives and attaches:

Part 2: https://gist.github.com/ozkangoksu/8eafb2d16434809ce50faa6903115a85

But After 45 seconds sd_mod starts to generate dmesg outputs for every SAS_HDD drive; [Sun Jul 9 09:56:16 2023] sd 6:0:0:0: device_block, handle(0x0012) [Sun Jul 9 09:56:16 2023] sd 6:0:1:0: device_block, handle(0x0013) [Sun Jul 9 09:56:16 2023] ses 6:0:34:0: _scsih_block_io_device skip device_block for SES handle(0x0034) [Sun Jul 9 09:56:16 2023] mpt3sas_cm1: log_info(0x3112010c): originator(PL), code(0x12), sub_code(0x010c) [Sun Jul 9 09:56:16 2023] mpt3sas_cm1: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)

The function runs second time as like a "loop" [Sun Jul 9 09:56:16 2023] ses 0:0:0:0: _scsih_block_io_device skip device_block for SES handle(0x0012) [Sun Jul 9 09:56:19 2023] ses 0:0:1:0: _scsih_block_io_device skip device_block for SES handle(0x0014) [Sun Jul 9 09:56:19 2023] sd 6:0:35:0: device_block, handle(0x0036)

When the second loop is over, mpt3sas_cm1 kicks-in before the third loop and removes transport ports: [Sun Jul 9 09:56:20 2023] sd 6:0:29:0: device_unblock and setting to running, handle(0x002f) [Sun Jul 9 09:56:20 2023] sd 6:0:30:0: device_unblock and setting to running, handle(0x0030) [Sun Jul 9 09:56:20 2023] sd 6:0:31:0: device_unblock and setting to running, handle(0x0031) [Sun Jul 9 09:56:20 2023] sd 6:0:32:0: device_unblock and setting to running, handle(0x0032) [Sun Jul 9 09:56:20 2023] sd 6:0:33:0: device_unblock and setting to running, handle(0x0033) [Sun Jul 9 09:56:20 2023] mpt3sas_cm1: mpt3sas_transport_port_remove: removed: sas_addr(0x5000c500cac022ca) [Sun Jul 9 09:56:20 2023] mpt3sas_cm1: removing handle(0x0012), sas_addr(0x5000c500cac022ca) [Sun Jul 9 09:56:20 2023] mpt3sas_cm1: enclosure logical id(0x540f078d86014000), slot(1) [Sun Jul 9 09:56:20 2023] mpt3sas_cm1: enclosure level(0x0000), connector name( )

At this point, mpt3sas_cm1 removes all the drive ports and regenerates:

[Sun Jul 9 09:56:20 2023] zio pool=hddtest vdev=/dev/disk/by-partuuid/03453b32-5793-4113-9ebd-64723a51b017 error=5 type=2 offset=8594128896 size=4096 flags=180880 [Sun Jul 9 09:56:20 2023] zio pool=hddtest vdev=/dev/disk/by-partuuid/03453b32-5793-4113-9ebd-64723a51b017 error=5 type=2 offset=9821274112 size=4096 flags=180880 [Sun Jul 9 09:56:20 2023] sd 6:0:5:0: [sdh] tag#4609 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s [Sun Jul 9 09:56:20 2023] sd 6:0:5:0: [sdh] tag#4609 CDB: Write(16) 8a 00 00 00 00 00 01 6d d7 08 00 00 00 08 00 00 [Sun Jul 9 09:56:20 2023] blk_update_request: I/O error, dev sdh, sector 23975688 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0 [Sun Jul 9 09:56:20 2023] zio pool=hddtest vdev=/dev/disk/by-partuuid/91a2cc2b-bc27-47f6-b127-018d41645a67 error=5 type=2 offset=12275531776 size=4096 flags=180880 [Sun Jul 9 09:56:20 2023] sd 6:0:22:0: [sdy] tag#2963 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s [Sun Jul 9 09:56:20 2023] sd 6:0:22:0: [sdy] tag#2963 CDB: Write(16) 8a 00 00 00 00 00 01 92 69 58 00 00 00 08 00 00 [Sun Jul 9 09:56:20 2023] blk_update_request: I/O error, dev sdy, sector 26372440 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0 [Sun Jul 9 09:56:20 2023] zio pool=hddtest vdev=/dev/disk/by-partuuid/2a2364c1-5273-46ad-9f85-e27ca227ee9a error=5 type=2 offset=13502668800 size=4096 flags=180880 [Sun Jul 9 09:56:20 2023] sd 6:0:19:0: [sdv] tag#4610 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s [Sun Jul 9 09:56:20 2023] sd 6:0:19:0: [sdv] tag#4610 CDB: Write(16) 8a 00 00 00 00 00 01 6d d7 08 00 00 00 08 00 00

Part 3 dmesg output: https://gist.github.com/ozkangoksu/e58c07eaf672c5e9cae336560d3a9367

When we lose drive ports, ZFS suspends raidz_2 SAS_HDD pool (28drive) but the SAS_SSD pool works without any problem. The problem does not affect SSD drives. The SAS_HDD's are "SEAGATE ST4000NM017A" and they are brand new. I checked all the drives with smartctl and all passed the long-self-test.

[Sun Jul 9 09:56:24 2023] WARNING: Pool 'hddtest' has encountered an uncorrectable I/O failure and has been suspended.

After all these, the SAS_HDD pool is suspended but the SAS_SSD pool is working without any problem. The SAS_HDD's are "SEAGATE ST4000NM017A" and they are brand new. I checked all of the drives with smartctl and all passed the long-self-test.

This problem only happens during startup. I can manually import the pool and use without any issue.

How can I fix it?

dmesg-after-reboot.txt

rincebrain commented 1 year ago

ZFS doesn't do anything clever with disks, so if accessing the disks breaks your controller, you and the controller manufacturer have a discussion to have.

ozkangoksu commented 1 year ago

Hello rincebrain. I understand your point but what is your answers for these questions: 1- When I create a pool, I can use it without any problem. Why it not breaks? 2- If I disable auto-import, I can import and use the pool manually without any problem. Why it not breaks?

I don't say ZFS is the root cause on this issue. ZFS only triggers this issue and the root cause is still unkown to me. I'm trying to find and solve it.

I've started to play around with the mpt3sas parameters, "mpt3sas.max_queue_depth=10000" did not work. diag_buffer_enable = -1 disable_discovery = -1 enable_sdev_max_qd = N hbas_to_enumerate = 0 host_tagset_enable = 1 irqpoll_weight = -1 logging_level = 0 max_lun = 16895 max_msix_vectors = -1 max_queue_depth = 10000 max_sectors = 65535 max_sgl_entries = -1 missing_delay = -1,-1 mpt3sas_fwfault_debug = 0 msix_disable = -1 perf_mode = -1 poll_queues = 0 prot_mask = -1 smp_affinity_enable = 1

ozkangoksu commented 1 year ago

I think I found the problem. After reboot, truenas tries to import the pool with a cache file to make it fast. After each reboot, the physical address changes so the cache file is not correct. When truenas tries to import the pool with the cache file, I don't know why but drives stars to get reset by mpt3sas driver and it ends up with suspended pool.

One of the actor is ZFS but zfs is not the main actor here. Thats why I'm closing this issue.

openzfs / zfs