openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.63k stars 1.75k forks source link

panic: command not inqueue, state = 0 at zpool import #16066

Closed agrajag9 closed 7 months ago

agrajag9 commented 7 months ago

System information

Type | Version/Name FreeBSD | 15.0-CURRENT Kernel Version | main-n268827-75464941dc17 Architecture | arm64 OpenZFS Version | zfs-2.2.99-365-FreeBSD_g8f2f6cd2a, zfs-kmod-2.2.99-365-FreeBSD_g8f2f6cd2a

Describe the problem you're observing

When attempting to import zpools from some Samsung SSDs attached to an Avago 9400-8i on an 16xA72-based platform, the system panics.

When the drives are attached directly to the SATA bus of the mainboard everything works fine.

The same drives and HBA work fine on multiple amd64 systems.

This is reproducible on both Debian and FreeBSD.

This is reproducible with multiple HBAs: 9211-8i, 9300-8i, and 9400-8i.

The panic goes away when setting dev.mpr.0.debug_level=0x7ff in /boot/loader.conf.

Sometimes it happens mid-boot shortly after the mpr module inits, but sometimes it makes it to the full shell just fine.

Describe how to reproduce the problem

Use weird hardware: https://bsd-hardware.info/?probe=7c5ed3c2fe

# zpool import ptmc
panic: command not inqueue, state = 0

cpuid = 12
time = 1712420979
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1a4
panic() at panic+0x48
mpr_complete_command() at mpr_complete_command+0x138
mpr_intr_locked() at mpr_intr_locked+0x80
mpr_intr_msi() at mpr_intr_msi+0x5c
ithread_loop() at ithread_loop+0x2a4
fork_exit() at fork_exit+0x78
fork_trampoline() at fork_trampoline+0x18
KDB: enter: panic
[ thread pid 12 tid 100205 ]
Stopped at      kdb_enter+0x4c: str     xzr, [x19, #1280]
db>
agrajag9 commented 7 months ago

This may be a hardware error, as setting tag depth to 1 also seems to make the problem go away.

rincebrain commented 7 months ago

That panic seems to be in the SAS controller driver, based on the mpr_* functions in the backtrace and that panic being in mpr, so I would suggest you file it against FreeBSD.

agrajag9 commented 7 months ago

Already did that. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270089

rincebrain commented 7 months ago

Great, then I'd suggest you close this, because it's not a bug in OpenZFS, and it's not really something changes in OpenZFS are going to do anything about.

agrajag9 commented 7 months ago

No, please read the report. I stated this exists in Linux as well as FreeBSD, and so it is not likely to be the FreeBSD mpr driver.

rincebrain commented 7 months ago

And what does the crash on Linux look like?

agrajag9 commented 7 months ago

dmesg: https://gist.github.com/agrajag9/486beef3f993acd975adc1ca0c17ddd9

It imported and exported just fine on the first try. A second import then triggers the error. Linux does not fall all the way over like FreeBSD, eventually recovering and continuing after 30 seconds.

root@debian:~# zpool status
no pools available
root@debian:~# zpool import
   pool: ptmc
     id: 8660854589035025673
  state: ONLINE
status: Some supported features are not enabled on the pool.
        (Note that they may be intentionally disabled if the
        'compatibility' property is set.)
 action: The pool can be imported using its name or numeric identifier, though
        some features will not be available without an explicit 'zpool upgrade'.
 config:

        ptmc        ONLINE
          mirror-0  ONLINE
            sda     ONLINE
            sdb     ONLINE
root@debian:~# zpool import ptmc
root@debian:~# zpool status
  pool: ptmc
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: resilvered 3.19G in 04:14:11 with 0 errors on Sat Sep 16 18:48:24 2023
config:

        NAME        STATE     READ WRITE CKSUM
        ptmc        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdb     ONLINE       0     0     0

errors: No known data errors
root@debian:~# zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALT
ptmc   200G  94.9G   105G        -       30G    49%    47%  1.00x    ONLINE  -
root@debian:~# zpool export ptmc
root@debian:~# zpool import ptmc
[   57.159696] ------------[ cut here ]------------
[   57.164311] WARNING: CPU: 12 PID: 1794 at drivers/scsi/mpt3sas/mpt3sas_base.]
...
[   57.505333]  el0t_64_sync+0x18c/0x190
[   57.508984] ---[ end trace 0000000000000000 ]---
[   87.826353] sd 0:0:0:0: attempting task abort!scmd(0x00000000a6532941), outstanding for 30668 ms & timeout 30000 ms
[   87.836803] sd 0:0:0:0: [sdb] tag#6360 CDB: Read(10) 28 00 04 00 09 10 00 00 02 00
robn commented 7 months ago

Right, so something is not right in either this single disk, or in the controller. Linux and FreeBSD drivers are responding to that error differently. Its probably fair to say a panic is not ideal here, but regardless, its vanishingly unlikely that OpenZFS is at fault - we don't generate raw SCSI commands, rather, we just ask the OS block layer to make requests to the disk on our behalf, and the code to do that on Linux and FreeBSD is very different and wholly unrelated.

rincebrain commented 7 months ago

That still looks like a hardware issue with something confusing your controller and logging an impossible case that is a panic on FBSD, to me.

I'm not saying OpenZFS is flawless, but this really looks like the hardware is the problem, not the software.

I would suspect you can probably trigger a similar problem under load with, say, an mdadm array.

ZFS doesn't have any special knowledge or evil that it does to underlying disk controllers, it just uses the same interfaces as any other filesystem on the OS, so if the controller is barfing, it's probably the case that you can reproduce it without ZFS, too.

agrajag9 commented 7 months ago

Except that I can only reproduce this in the ZFS context. EXT4 on the same setup works fine.

So ZFS is doing something extra.

root@debian:/mnt# df -h -T /mnt
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/sda2      ext4   32G  8.0K   30G   1% /mnt
root@debian:/mnt# fio --name=TEST --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=8 --size=2g --iodepth=64 --runtime=60 --time_based --end_fsync=1 2>&1 | tee /tmp/fio.log

https://gist.github.com/agrajag9/122f556fc1d3cb6d35046e269825fe0e

rincebrain commented 7 months ago

I suggested mdadm to replicate hitting multiple disks at once specifically because it seems like the crash message is about the controller getting confused about messages that it doesn't expect to be in the queue being there or vice-versa.

agrajag9 commented 7 months ago

Well then. It's reproducible with mdadm mirrors also.

root@debian:~# mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sda2 /dev/sdb1

mdadm: /dev/sda2 appears to contain an ext2fs file system
       size=33554432K  mtime=Sun Apr  7 10:28:58 2024
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device.  If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
[27694.045483] md/raid1:md1: not clean -- starting background reconstruction
[27694.052294] md/raid1:md1: active with 2 out of 2 mirrors
[27694.057658] md1: detected capacity change from 0 to 67041280
mdadm: array /dev/md1 started.
[27694.065398] md: resync of RAID array md1
root@debian:~# mkfs.ext4 /dev/md1
mke2fs 1.47.0 (5-Feb-2023)
Discarding device blocks: done
Creating filesystem with 8380160 4k blocks and 2097152 inodes
Filesystem UUID: 05a1c291-f521-4d82-95fb-a6d586d50de2
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

root@debian:~# mount /dev/md1 /mnt
root@debian:/mnt# fio --name=TEST --ioengine=posixaio --rw=randwrite --bs=4k --
TEST: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B4
fio-3.33
Starting 8 processes
TEST: Laying out IO file (1 file / 2048MiB)
TEST: Laying out IO file (1 file / 2048MiB)
TEST: Laying out IO file (1 file / 2048MiB)
TEST: Laying out IO file (1 file / 2048MiB)
TEST: Laying out IO file (1 file / 2048MiB)
TEST: Laying out IO file (1 file / 2048MiB)
TEST: Laying out IO file (1 file / 2048MiB)
TEST: Laying out IO file (1 file / 2048MiB)
[27735.671557] ------------[ cut here ]------------
[27735.676182] WARNING: CPU: 15 PID: 0 at drivers/scsi/mpt3sas/mpt3sas_base.c:1]
[27735.686804] Modules linked in: raid1 md_mod binfmt_misc nls_ascii nls_cp437 n
[27735.752722] CPU: 15 PID: 0 Comm: swapper/15 Tainted: P        W  OE      6.11
[27735.762452] Hardware name: SolidRun Ltd. SolidRun CEX7 Platform, BIOS EDK II3
[27735.770530] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[27735.777480] pc : _get_st_from_smid+0x54/0x70 [mpt3sas]
[27735.782620] lr : _base_get_cb_idx+0x94/0xc0 [mpt3sas]
[27735.787668] sp : ffff80000807be70
[27735.790971] x29: ffff80000807be70 x28: 0000000000000001 x27: 0000000000000000
[27735.798097] x26: 0000000000000006 x25: ffff31bdd2554850 x24: ffffd0f85c221bc8
[27735.805223] x23: 000000000000000f x22: 0000000000000141 x21: ffffffffffffffff
[27735.812348] x20: ffff31bbed4df600 x19: ffff319c388c5408 x18: 0000000000000000
[27735.819474] x17: ffff60c679150000 x16: ffffd0f8b4419460 x15: 0000000000000000
[27735.826599] x14: 0000000000000002 x13: 0000000000000000 x12: 0000000000001176
[27735.833725] x11: 0000000000000040 x10: ffffd0f8b5b5fc88 x9 : ffffd0f85c1df474
[27735.840851] x8 : ffff319c38955c00 x7 : 0000000001fc2488 x6 : 000000a1a83d9644
[27735.847977] x5 : 00ffffffffffffff x4 : 0000000000001b94 x3 : 0000000000001b91
[27735.855102] x2 : ffff31bdd2554850 x1 : 0000000000000000 x0 : ffff31bdd2554850
[27735.862227] Call trace:
[27735.864662]  _get_st_from_smid+0x54/0x70 [mpt3sas]
[27735.869455]  _base_process_reply_queue+0xb4/0x1084 [mpt3sas]
[27735.875111]  _base_interrupt+0x3c/0x50 [mpt3sas]
[27735.879725]  __handle_irq_event_percpu+0x60/0x1e0
[27735.884423]  handle_irq_event+0x50/0x120
[27735.888334]  handle_fasteoi_irq+0xc4/0x210
[27735.892425]  generic_handle_domain_irq+0x34/0x50
[27735.897034]  gic_handle_irq+0x58/0x13c
[27735.900773]  call_on_irq_stack+0x24/0x50
[27735.904686]  do_interrupt_handler+0x88/0x90
[27735.908859]  el1_interrupt+0x34/0x54
[27735.912427]  el1h_64_irq_handler+0x18/0x2c
[27735.916512]  el1h_64_irq+0x64/0x68
[27735.919902]  cpuidle_enter_state+0xbc/0x440
[27735.924077]  cpuidle_enter+0x40/0x60
[27735.927643]  do_idle+0x234/0x2c0
[27735.930860]  cpu_startup_entry+0x40/0x4c
[27735.934772]  secondary_start_kernel+0x130/0x154
[27735.939293]  __secondary_switched+0xb0/0xb4
[27735.943467] ---[ end trace 0000000000000000 ]---
...