openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.64k stars 1.75k forks source link

Panic while trying to import ZFS pool(s) #14878

Open herrehesse opened 1 year ago

herrehesse commented 1 year ago

System information

Type Version/Name
Distribution Version zfs-2.1.11 (https://github.com/openzfs/zfs/releases/tag/zfs-2.1.11)
Kernel Version 6.4

(tried different versions and operating system supporting ZFS. Issue persists)

Describe the problem you're observing

We are currently facing an issue with our server that has three pools running on the ZFS file system. Following a power loss event, we encountered difficulties in booting the Ubuntu 22.04 system. The system panics during the ZFS pool import process at startup. We have thoroughly investigated and ruled out any problems with the disks, hardware, servers, or ZFS/OS versions. We have exhausted all possible troubleshooting steps. It appears that one of the ZFS blocks or metadata has become corrupted, leading to the panic.

Describe how to reproduce the problem

  1. Booting server with JBOD attached on SAS3 (60 drives)
  2. Loading all disks correctly, HBA is OK, no errors
  3. Booting into fresh Ubuntu 22.04 / TrueNas 13
  4. On trying to import pools we get a --> panic --> reboot
  5. After reboot we are back at step 1.

Include any warning/errors/backtraces from the system logs

Example:

[  177.673647] VERIFY0(0 == sls->sls_mscount) failed (0 == -1)
[  177.673684] PANIC at spa.c:1613:spa_unload_log_sm_metadata()
[ 2306.564770] CPU: 22 PID: 117752 Comm: zpool Tainted: P           OE      6.4.0-060400rc2-generic #202305142030
[ 2306.564774] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.1 05/15/2015
[ 2306.564776] Call Trace:
[ 2306.564780]  <TASK>
[ 2306.564784]  dump_stack_lvl+0x48/0x70
[ 2306.564799]  dump_stack+0x10/0x20
[ 2306.564803]  spl_dumpstack+0x29/0x40 [spl]
[ 2306.564833]  spl_panic+0xfc/0x120 [spl]
[ 2306.564849]  ? kfree+0x78/0x120
[ 2306.564857]  spa_unload+0x4e6/0x580 [zfs]
[ 2306.565280]  spa_tryimport+0x253/0x470 [zfs]
[ 2306.565538]  zfs_ioc_pool_tryimport+0x85/0xe0 [zfs]
[ 2306.565784]  zfsdev_ioctl_common+0x8ee/0xa40 [zfs]
[ 2306.566032]  ? check_heap_object+0x3c/0x1e0
[ 2306.566038]  ? __check_object_size.part.0+0x72/0x150
[ 2306.566042]  zfsdev_ioctl+0x57/0xf0 [zfs]
[ 2306.566281]  __x64_sys_ioctl+0xa0/0xe0
[ 2306.566288]  do_syscall_64+0x5b/0x90
[ 2306.566295]  ? irqentry_exit+0x43/0x50
[ 2306.566301]  ? exc_page_fault+0x94/0x1b0
[ 2306.566305]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 2306.566313] RIP: 0033:0x7f4942d119ef
[ 2306.566343] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[ 2306.566347] RSP: 002b:00007ffd385b7cf0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 2306.566351] RAX: ffffffffffffffda RBX: 00007ffd385b7d60 RCX: 00007f4942d119ef
[ 2306.566354] RDX: 00007ffd385b7d60 RSI: 0000000000005a06 RDI: 0000000000000003
[ 2306.566356] RBP: 00007ffd385bb340 R08: 00007f4942df7330 R09: 00007f4942df7330
[ 2306.566358] R10: 0000000000000000 R11: 0000000000000246 R12: 000055e2e3d7a320
[ 2306.566360] R13: 000055e2e3d85ef0 R14: 00007ffd385bb420 R15: 0000000000000000
[ 2306.566364]  </TASK>`

-->

92hackers commented 3 months ago

i encountered the same issue, still no way to tackle it.