openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.63k stars 1.75k forks source link

pool corrupted after hibernation #14118

Open a14a opened 2 years ago

a14a commented 2 years ago

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version -
Kernel Version 5.18.0
Architecture x86_64
OpenZFS Version zfs-2.1.99-694_g17b2ae0b2

Describe the problem you're observing

Zpool corrupted after resume from swap, hardware verified as healthy. Single ZFS disk, separate swap device, no zvols or luks.

This is a laptop with ZFS as root filesystem. The hardware is a single 1 TB NVMe drive with three GPT partitions for EFI, boot pool and root pool. No luks or dm-crypt involved. No separate log devices.

I use hibernation to a separate physical SSD block device, formatted as raw swap, no zvol. Resume is done by Gentoo's standard genkernel initrd built via genkernel initramfs --zfs, i don't expect initrd to be causing it.

The hardware is 1 year old, filesystem was 2 months old, with hibernation in daily use. The corruption occurred after several quick hibernate/resume cycles, i hibernated and resumed the machine maybe five or six times in a row.

# zpool import -R /mnt2 amber_rpool
cannot import 'amber_rpool': I/O error
        Destroy and re-create the pool from
        a backup source.

# zpool import -R /mnt2 -o readonly=on amber_rpool
cannot import 'amber_rpool': I/O error
        Destroy and re-create the pool from
        a backup source.

I attempted rewinding to an earlier transaction without success. Some zdb info below.

pool debug info

``` # zpool import -FfXm -T 3938230 -d /dev/nvme1n1p3 pool: amber_rpool id: 13295185738152581673 state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-72 config: amber_rpool FAULTED corrupted data nvme1n1p3 ONLINE # zdb -ei amber_rpool Dataset mos [META], ID 0, cr_txg 4, 179M, 343 objects Dataset amber_rpool/crypt/log@crash-recovery-2022-03-02 [ZPL], ID 562, cr_txg 18246, 434M, 228 objects Dataset amber_rpool/crypt/log@manual-2022-04-08 [ZPL], ID 56483, cr_txg 1430821, 435M, 226 objects Dataset amber_rpool/crypt/log [ZPL], ID 680, cr_txg 132, 435M, 225 objects ZIL header: claim_txg 3938577, claim_blk_seq 2, claim_lr_seq 0 replay_seq 0, flags 0x2 Dataset amber_rpool/crypt/home/n/t@crash-recovery-2022-03-02 [ZPL], ID 814, cr_txg 18246, 216K, 7 objects Dataset amber_rpool/crypt/home/n/t@manual-2022-04-08 [ZPL], ID 56489, cr_txg 1430821, 929M, 415 objects Dataset amber_rpool/crypt/home/n/t [ZPL], ID 77, cr_txg 114, 1.47G, 781 objects ZIL header: claim_txg 3938577, claim_blk_seq 1, claim_lr_seq 0 replay_seq 0, flags 0x2 Dataset amber_rpool/crypt/home/n@crash-recovery-2022-03-02 [ZPL], ID 812, cr_txg 18246, 62.8G, 438804 objects Dataset amber_rpool/crypt/home/n@znap_frequent-2022-02-28-2000 [ZPL], ID 29, cr_txg 298, 200K, 7 objects Dataset amber_rpool/crypt/home/n@znap_hourly-2022-02-28-2000 [ZPL], ID 273, cr_txg 299, 200K, 7 objects Dataset amber_rpool/crypt/home/n@manual-2022-04-08 [ZPL], ID 56487, cr_txg 1430821, 61.9G, 532301 objects Dataset amber_rpool/crypt/home/n [ZPL], ID 138, cr_txg 113, 88.1G, 538753 objects ZIL header: claim_txg 3938577, claim_blk_seq 825, claim_lr_seq 0 replay_seq 0, flags 0x2 Dataset amber_rpool/crypt/home@manual-2022-04-08 [ZPL], ID 56485, cr_txg 1430821, 13.9G, 166866 objects Dataset amber_rpool/crypt/home@znap_frequent-2022-02-28-2000 [ZPL], ID 396, cr_txg 296, 200K, 7 objects Dataset amber_rpool/crypt/home@znap_hourly-2022-02-28-2000 [ZPL], ID 25, cr_txg 297, 200K, 7 objects Dataset amber_rpool/crypt/home@crash-recovery-2022-03-02 [ZPL], ID 810, cr_txg 18246, 13.9G, 166866 objects Dataset amber_rpool/crypt/home [ZPL], ID 671, cr_txg 110, 13.9G, 166866 objects Dataset amber_rpool/crypt/ROOT/gentoo/distfiles@manual-2022-04-08 [ZPL], ID 56495, cr_txg 1430821, 1.66G, 321 objects Dataset amber_rpool/crypt/ROOT/gentoo/distfiles@crash-recovery-2022-03-02 [ZPL], ID 824, cr_txg 18246, 1.32G, 308 objects failed to hold dataset 'amber_rpool/crypt/ROOT/gentoo/distfiles': Input/output error Dataset amber_rpool/crypt/ROOT/gentoo/packages@manual-2022-04-08 [ZPL], ID 56497, cr_txg 1430821, 27.2G, 4486 objects Dataset amber_rpool/crypt/ROOT/gentoo/packages@crash-recovery-2022-03-02 [ZPL], ID 826, cr_txg 18246, 24.1G, 4215 objects Dataset amber_rpool/crypt/ROOT/gentoo/packages [ZPL], ID 663, cr_txg 102, 30.9G, 5039 objects Dataset amber_rpool/crypt/ROOT/gentoo@znap_frequent-2022-02-28-2000 [ZPL], ID 153, cr_txg 294, 352K, 14 objects Dataset amber_rpool/crypt/ROOT/gentoo@znap_hourly-2022-02-28-2000 [ZPL], ID 512, cr_txg 295, 352K, 14 objects Dataset amber_rpool/crypt/ROOT/gentoo@crash-recovery-2022-03-02 [ZPL], ID 820, cr_txg 18246, 35.3G, 2193754 objects Dataset amber_rpool/crypt/ROOT/gentoo@manual-2022-04-08 [ZPL], ID 56493, cr_txg 1430821, 36.5G, 2298111 objects Dataset amber_rpool/crypt/ROOT/gentoo [ZPL], ID 655, cr_txg 85, 39.5G, 2545567 objects ZIL header: claim_txg 3938577, claim_blk_seq 292, claim_lr_seq 0 replay_seq 0, flags 0x2 Dataset amber_rpool/crypt/ROOT@znap_frequent-2022-02-28-2000 [ZPL], ID 21, cr_txg 292, 192K, 6 objects Dataset amber_rpool/crypt/ROOT@manual-2022-04-08 [ZPL], ID 56491, cr_txg 1430821, 192K, 6 objects Dataset amber_rpool/crypt/ROOT@crash-recovery-2022-03-02 [ZPL], ID 816, cr_txg 18246, 192K, 6 objects Dataset amber_rpool/crypt/ROOT@znap_hourly-2022-02-28-2000 [ZPL], ID 149, cr_txg 293, 192K, 6 objects Dataset amber_rpool/crypt/ROOT [ZPL], ID 772, cr_txg 83, 192K, 6 objects Dataset amber_rpool/crypt@znap_frequent-2022-02-28-2000 [ZPL], ID 134, cr_txg 290, 192K, 6 objects Dataset amber_rpool/crypt@crash-recovery-2022-03-02 [ZPL], ID 560, cr_txg 18246, 192K, 6 objects Dataset amber_rpool/crypt@manual-2022-04-08 [ZPL], ID 56481, cr_txg 1430821, 192K, 6 objects Dataset amber_rpool/crypt@znap_hourly-2022-02-28-2000 [ZPL], ID 145, cr_txg 291, 192K, 6 objects Dataset amber_rpool/crypt [ZPL], ID 387, cr_txg 45, 192K, 6 objects Dataset amber_rpool@znap_hourly-2022-02-28-2000 [ZPL], ID 392, cr_txg 289, 96K, 6 objects Dataset amber_rpool@znap_frequent-2022-02-28-2015 [ZPL], ID 278, cr_txg 360, 96K, 6 objects Dataset amber_rpool@manual-2022-04-08 [ZPL], ID 56479, cr_txg 1430821, 96K, 6 objects Dataset amber_rpool@znap_frequent-2022-02-28-2000 [ZPL], ID 269, cr_txg 288, 96K, 6 objects Dataset amber_rpool@crash-recovery-2022-03-02 [ZPL], ID 556, cr_txg 18246, 96K, 6 objects Dataset amber_rpool [ZPL], ID 54, cr_txg 1, 96K, 6 objects MOS object 2 (bpobj) leaked MOS object 67 (zap) leaked MOS object 68 (DSL props) leaked MOS object 69 (DSL directory child map) leaked MOS object 70 (zap) leaked MOS object 71 (DSL dataset snap map) leaked MOS object 73 (zap) leaked MOS object 20691 (bpobj) leaked MOS object 56496 (DSL deadlist map) leaked Verified large_blocks feature refcount of 0 is correct large_dnode feature refcount mismatch: 36 consumers != 37 refcount Verified sha512 feature refcount of 0 is correct Verified skein feature refcount of 0 is correct Verified edonr feature refcount of 0 is correct userobj_accounting feature refcount mismatch: 36 consumers != 37 refcount encryption feature refcount mismatch: 36 consumers != 37 refcount project_quota feature refcount mismatch: 36 consumers != 37 refcount Verified redaction_bookmarks feature refcount of 0 is correct Verified redacted_datasets feature refcount of 0 is correct Verified bookmark_written feature refcount of 0 is correct Verified livelist feature refcount of 0 is correct Verified zstd_compress feature refcount of 0 is correct root@nkarch ~ # zdb -ul /dev/nvme1n1p3 ------------------------------------ LABEL 0 ------------------------------------ version: 5000 name: 'amber_rpool' state: 0 txg: 3937721 pool_guid: 13295185738152581673 errata: 0 hostname: '(none)' top_guid: 6749422823089631006 guid: 6749422823089631006 vdev_children: 1 vdev_tree: type: 'disk' id: 0 guid: 6749422823089631006 path: '/dev/nvme0n1p3' whole_disk: 0 metaslab_array: 256 metaslab_shift: 33 ashift: 12 asize: 1021519331328 is_log: 0 create_txg: 4 features_for_read: com.delphix:hole_birth com.delphix:embedded_data labels = 0 1 2 3 Uberblock[0] magic = 0000000000bab10c version = 5000 txg = 3938528 guid_sum = 1597864487532661063 timestamp = 1655822150 UTC = Tue Jun 21 16:35:50 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[1] magic = 0000000000bab10c version = 5000 txg = 3938529 guid_sum = 1597864487532661063 timestamp = 1655822156 UTC = Tue Jun 21 16:35:56 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[2] magic = 0000000000bab10c version = 5000 txg = 3938466 guid_sum = 1597864487532661063 timestamp = 1655822076 UTC = Tue Jun 21 16:34:36 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[3] magic = 0000000000bab10c version = 5000 txg = 3938467 guid_sum = 1597864487532661063 timestamp = 1655822076 UTC = Tue Jun 21 16:34:36 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[4] magic = 0000000000bab10c version = 5000 txg = 3938468 guid_sum = 1597864487532661063 timestamp = 1655822076 UTC = Tue Jun 21 16:34:36 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[5] magic = 0000000000bab10c version = 5000 txg = 3938501 guid_sum = 1597864487532661063 timestamp = 1655822107 UTC = Tue Jun 21 16:35:07 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[6] magic = 0000000000bab10c version = 5000 txg = 3938566 guid_sum = 1597864487532661063 timestamp = 1655822194 UTC = Tue Jun 21 16:36:34 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[7] magic = 0000000000bab10c version = 5000 txg = 3938567 guid_sum = 1597864487532661063 timestamp = 1655822199 UTC = Tue Jun 21 16:36:39 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[8] magic = 0000000000bab10c version = 5000 txg = 3938568 guid_sum = 1597864487532661063 timestamp = 1655822204 UTC = Tue Jun 21 16:36:44 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[9] magic = 0000000000bab10c version = 5000 txg = 3938569 guid_sum = 1597864487532661063 timestamp = 1655822209 UTC = Tue Jun 21 16:36:49 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[10] magic = 0000000000bab10c version = 5000 txg = 3938570 guid_sum = 1597864487532661063 timestamp = 1655822215 UTC = Tue Jun 21 16:36:55 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[11] magic = 0000000000bab10c version = 5000 txg = 3938475 guid_sum = 1597864487532661063 timestamp = 1655822084 UTC = Tue Jun 21 16:34:44 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[12] magic = 0000000000bab10c version = 5000 txg = 3938540 guid_sum = 1597864487532661063 timestamp = 1655822171 UTC = Tue Jun 21 16:36:11 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[13] magic = 0000000000bab10c version = 5000 txg = 3938541 guid_sum = 1597864487532661063 timestamp = 1655822176 UTC = Tue Jun 21 16:36:16 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[14] magic = 0000000000bab10c version = 5000 txg = 3938542 guid_sum = 1597864487532661063 timestamp = 1655822181 UTC = Tue Jun 21 16:36:21 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[15] magic = 0000000000bab10c version = 5000 txg = 3938575 guid_sum = 1597864487532661063 timestamp = 1655822215 UTC = Tue Jun 21 16:36:55 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[16] magic = 0000000000bab10c version = 5000 txg = 3938576 guid_sum = 1597864487532661063 timestamp = 1655822215 UTC = Tue Jun 21 16:36:55 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[17] magic = 0000000000bab10c version = 5000 txg = 3938577 guid_sum = 1597864487532661063 timestamp = 1655857063 UTC = Wed Jun 22 02:17:43 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[18] magic = 0000000000bab10c version = 5000 txg = 3938578 guid_sum = 1597864487532661063 timestamp = 1655861150 UTC = Wed Jun 22 03:25:50 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[19] magic = 0000000000bab10c version = 5000 txg = 3938579 guid_sum = 1597864487532661063 timestamp = 1655879645 UTC = Wed Jun 22 08:34:05 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[20] magic = 0000000000bab10c version = 5000 txg = 3938580 guid_sum = 1597864487532661063 timestamp = 1656359461 UTC = Mon Jun 27 21:51:01 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[21] magic = 0000000000bab10c version = 5000 txg = 3938581 guid_sum = 1597864487532661063 timestamp = 1656364059 UTC = Mon Jun 27 23:07:39 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[22] magic = 0000000000bab10c version = 5000 txg = 3938582 guid_sum = 1597864487532661063 timestamp = 1656931598 UTC = Mon Jul 4 12:46:38 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[23] magic = 0000000000bab10c version = 5000 txg = 3938327 guid_sum = 1597864487532661063 timestamp = 1655821889 UTC = Tue Jun 21 16:31:29 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[24] magic = 0000000000bab10c version = 5000 txg = 3938552 guid_sum = 1597864487532661063 timestamp = 1655822194 UTC = Tue Jun 21 16:36:34 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[25] magic = 0000000000bab10c version = 5000 txg = 3938553 guid_sum = 1597864487532661063 timestamp = 1655822194 UTC = Tue Jun 21 16:36:34 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[26] magic = 0000000000bab10c version = 5000 txg = 3938394 guid_sum = 1597864487532661063 timestamp = 1655821988 UTC = Tue Jun 21 16:33:08 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[27] magic = 0000000000bab10c version = 5000 txg = 3938171 guid_sum = 1597864487532661063 timestamp = 1655821725 UTC = Tue Jun 21 16:28:45 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[28] magic = 0000000000bab10c version = 5000 txg = 3938460 guid_sum = 1597864487532661063 timestamp = 1655822071 UTC = Tue Jun 21 16:34:31 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[29] magic = 0000000000bab10c version = 5000 txg = 3938461 guid_sum = 1597864487532661063 timestamp = 1655822076 UTC = Tue Jun 21 16:34:36 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[30] magic = 0000000000bab10c version = 5000 txg = 3938462 guid_sum = 1597864487532661063 timestamp = 1655822076 UTC = Tue Jun 21 16:34:36 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 Uberblock[31] magic = 0000000000bab10c version = 5000 txg = 3938463 guid_sum = 1597864487532661063 timestamp = 1655822076 UTC = Tue Jun 21 16:34:36 2022 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 ```

Eventually i was able to override the errors by disabling spa_load_verify_data and spa_load_verify_metadata. After that, import would simply freeze indefinitely. However, read-only import works when used with those overrides.

trace

``` # echo 0 > /sys/module/zfs/parameters/spa_load_verify_metadata; echo 0 > /sys/module/zfs/parameters/spa_load_verify_data # zpool import -R /mnt2 amber_rpool (this will freeze) # dmesg ... [ 8154.710601] WARNING: can't open objset 70, error 5 [ 8154.735379] WARNING: can't open objset 70, error 5 [ 8161.259640] WARNING: can't open objset 70, error 5 [ 8161.284331] WARNING: can't open objset 70, error 5 [ 8167.034045] WARNING: can't open objset 70, error 5 [ 8167.035696] WARNING: can't open objset for 70, error 5 [ 8167.113492] WARNING: Pool 'amber_rpool' has encountered an uncorrectable I/O failure and has been suspended. [ 8355.560061] INFO: task zpool:9424 blocked for more than 122 seconds. [ 8355.560072] Tainted: P OE 5.18.8-arch1-1 #1 [ 8355.560076] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 8355.560079] task:zpool state:D stack: 0 pid: 9424 ppid: 1709 flags:0x00004006 [ 8355.560090] Call Trace: [ 8355.560093] [ 8355.560101] __schedule+0x37c/0x11f0 [ 8355.560119] ? autoremove_wake_function+0x2f/0x60 [ 8355.560134] schedule+0x4f/0xb0 [ 8355.560144] io_schedule+0x46/0x70 [ 8355.560156] cv_wait_common+0xaa/0x130 [spl 2cf84daf96686f1282819fc77b136156523175de] [ 8355.560180] ? cpuacct_percpu_seq_show+0x20/0x20 [ 8355.560193] txg_wait_synced_impl+0xc8/0x110 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.560536] txg_wait_synced+0x10/0x40 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.560846] spa_config_update+0x46/0x170 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.561145] spa_import+0x60e/0x870 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.561443] zfs_ioc_pool_import+0x158/0x180 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.561762] zfsdev_ioctl_common+0x8d3/0xa00 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.562090] zfsdev_ioctl+0x53/0xd0 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.562399] __x64_sys_ioctl+0x8e/0xc0 [ 8355.562410] do_syscall_64+0x5c/0x90 [ 8355.562420] ? do_user_addr_fault+0x1db/0x680 [ 8355.562433] ? exc_page_fault+0x74/0x170 [ 8355.562439] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 8355.562449] RIP: 0033:0x7f7d877077af [ 8355.562456] RSP: 002b:00007fff312a4470 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 8355.562464] RAX: ffffffffffffffda RBX: 0000559051f93630 RCX: 00007f7d877077af [ 8355.562468] RDX: 00007fff312a4540 RSI: 0000000000005a02 RDI: 0000000000000003 [ 8355.562473] RBP: 00007fff312a8430 R08: 000000000000b0b0 R09: 00007f7d877fe3d0 [ 8355.562476] R10: 00007f7d877fe3d0 R11: 0000000000000246 R12: 00007fff312a4540 [ 8355.562480] R13: 00007f7d700014f8 R14: 0000000000000000 R15: 0000559051fc4d90 [ 8355.562488] [ 8355.562504] INFO: task txg_sync:9554 blocked for more than 122 seconds. [ 8355.562509] Tainted: P OE 5.18.8-arch1-1 #1 [ 8355.562512] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 8355.562514] task:txg_sync state:D stack: 0 pid: 9554 ppid: 2 flags:0x00004000 [ 8355.562523] Call Trace: [ 8355.562526] [ 8355.562529] __schedule+0x37c/0x11f0 [ 8355.562543] schedule+0x4f/0xb0 [ 8355.562552] schedule_timeout+0x98/0x150 [ 8355.562559] ? __bpf_trace_tick_stop+0x10/0x10 [ 8355.562566] io_schedule_timeout+0x50/0x80 [ 8355.562576] __cv_timedwait_common+0x12c/0x170 [spl 2cf84daf96686f1282819fc77b136156523175de] [ 8355.562598] ? cpuacct_percpu_seq_show+0x20/0x20 [ 8355.562610] __cv_timedwait_io+0x19/0x20 [spl 2cf84daf96686f1282819fc77b136156523175de] [ 8355.562629] zio_wait+0x13a/0x2b0 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.562953] dmu_buf_will_dirty_impl+0xa6/0x150 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.563189] dmu_write_impl+0x3f/0xd0 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.563404] dmu_write+0xcc/0x190 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.563617] spa_history_write+0x18f/0x1d0 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.563916] spa_history_log_sync+0x19d/0x7c0 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.564213] dsl_sync_task_sync+0xa6/0xf0 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.564481] dsl_pool_sync+0x3f6/0x510 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.564607] spa_sync+0x553/0xf70 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.564697] ? spa_txg_history_init_io+0x114/0x120 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.564791] txg_sync_thread+0x279/0x480 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.564885] ? txg_fini+0x270/0x270 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8355.564978] ? __thread_exit+0x20/0x20 [spl 2cf84daf96686f1282819fc77b136156523175de] [ 8355.564987] thread_generic_wrapper+0x5b/0x70 [spl 2cf84daf96686f1282819fc77b136156523175de] [ 8355.564995] kthread+0xdb/0x110 [ 8355.564999] ? kthread_complete_and_exit+0x20/0x20 [ 8355.565003] ret_from_fork+0x1f/0x30 [ 8355.565007] [ 8478.439751] INFO: task zpool:9424 blocked for more than 245 seconds. [ 8478.439763] Tainted: P OE 5.18.8-arch1-1 #1 [ 8478.439766] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 8478.439769] task:zpool state:D stack: 0 pid: 9424 ppid: 1709 flags:0x00004006 [ 8478.439780] Call Trace: [ 8478.439783] [ 8478.439790] __schedule+0x37c/0x11f0 [ 8478.439809] ? autoremove_wake_function+0x2f/0x60 [ 8478.439823] schedule+0x4f/0xb0 [ 8478.439832] io_schedule+0x46/0x70 [ 8478.439844] cv_wait_common+0xaa/0x130 [spl 2cf84daf96686f1282819fc77b136156523175de] [ 8478.439867] ? cpuacct_percpu_seq_show+0x20/0x20 [ 8478.439879] txg_wait_synced_impl+0xc8/0x110 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8478.440191] txg_wait_synced+0x10/0x40 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8478.440476] spa_config_update+0x46/0x170 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8478.440752] spa_import+0x60e/0x870 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8478.441028] zfs_ioc_pool_import+0x158/0x180 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8478.441324] zfsdev_ioctl_common+0x8d3/0xa00 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8478.441634] zfsdev_ioctl+0x53/0xd0 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8478.441919] __x64_sys_ioctl+0x8e/0xc0 [ 8478.441930] do_syscall_64+0x5c/0x90 [ 8478.441941] ? do_user_addr_fault+0x1db/0x680 [ 8478.441953] ? exc_page_fault+0x74/0x170 [ 8478.441960] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 8478.441969] RIP: 0033:0x7f7d877077af [ 8478.441977] RSP: 002b:00007fff312a4470 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 8478.441985] RAX: ffffffffffffffda RBX: 0000559051f93630 RCX: 00007f7d877077af [ 8478.441989] RDX: 00007fff312a4540 RSI: 0000000000005a02 RDI: 0000000000000003 [ 8478.441993] RBP: 00007fff312a8430 R08: 000000000000b0b0 R09: 00007f7d877fe3d0 [ 8478.441997] R10: 00007f7d877fe3d0 R11: 0000000000000246 R12: 00007fff312a4540 [ 8478.442001] R13: 00007f7d700014f8 R14: 0000000000000000 R15: 0000559051fc4d90 [ 8478.442009] [ 8601.319173] INFO: task zpool:9424 blocked for more than 368 seconds. [ 8601.319185] Tainted: P OE 5.18.8-arch1-1 #1 [ 8601.319190] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 8601.319192] task:zpool state:D stack: 0 pid: 9424 ppid: 1709 flags:0x00004006 [ 8601.319203] Call Trace: [ 8601.319206] [ 8601.319213] __schedule+0x37c/0x11f0 [ 8601.319230] ? autoremove_wake_function+0x2f/0x60 [ 8601.319244] schedule+0x4f/0xb0 [ 8601.319253] io_schedule+0x46/0x70 [ 8601.319265] cv_wait_common+0xaa/0x130 [spl 2cf84daf96686f1282819fc77b136156523175de] [ 8601.319288] ? cpuacct_percpu_seq_show+0x20/0x20 [ 8601.319301] txg_wait_synced_impl+0xc8/0x110 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8601.319612] txg_wait_synced+0x10/0x40 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8601.319901] spa_config_update+0x46/0x170 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8601.320176] spa_import+0x60e/0x870 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8601.320448] zfs_ioc_pool_import+0x158/0x180 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8601.320743] zfsdev_ioctl_common+0x8d3/0xa00 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8601.321046] zfsdev_ioctl+0x53/0xd0 [zfs 8aa59c09c28351724b1ba2221e0acd9070264074] [ 8601.321331] __x64_sys_ioctl+0x8e/0xc0 [ 8601.321342] do_syscall_64+0x5c/0x90 [ 8601.321352] ? do_user_addr_fault+0x1db/0x680 [ 8601.321365] ? exc_page_fault+0x74/0x170 [ 8601.321371] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 8601.321380] RIP: 0033:0x7f7d877077af [ 8601.321387] RSP: 002b:00007fff312a4470 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 8601.321395] RAX: ffffffffffffffda RBX: 0000559051f93630 RCX: 00007f7d877077af [ 8601.321399] RDX: 00007fff312a4540 RSI: 0000000000005a02 RDI: 0000000000000003 [ 8601.321404] RBP: 00007fff312a8430 R08: 000000000000b0b0 R09: 00007f7d877fe3d0 [ 8601.321408] R10: 00007f7d877fe3d0 R11: 0000000000000246 R12: 00007fff312a4540 [ 8601.321411] R13: 00007f7d700014f8 R14: 0000000000000000 R15: 0000559051fc4d90 [ 8601.321419] ```

As long as i import read-only, the pool seems to be okay, the failures mentioned in zdb dump only seem to have broken one disposable dataset.

# echo 0 > /sys/module/zfs/parameters/spa_load_verify_metadata; echo 0 > /sys/module/zfs/parameters/spa_load_verify_data
# zpool import -R /mnt2 -o readonly=on amber_rpool
# zfs list
cannot iterate filesystems: I/O error
NAME                                USED  AVAIL     REFER  MOUNTPOINT
amber_rpool                         217G   697G       96K  /mnt2
amber_rpool/crypt                   217G   697G      192K  none
amber_rpool/crypt/ROOT             91.8G   697G      192K  none
amber_rpool/crypt/ROOT/gentoo      91.8G   697G     39.5G  /mnt2
amber_rpool/crypt/home              125G   697G     13.9G  /mnt2/home
amber_rpool/crypt/home/n            111G   697G     88.1G  /mnt2/home/n
amber_rpool/crypt/home/n/t         1.47G   697G     1.47G  /mnt2/home/n/t
amber_rpool/crypt/log               437M   697G      435M  /mnt2/var/log

# zpool status -v
  pool: amber_rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
config:

        NAME         STATE     READ WRITE CKSUM
        amber_rpool  ONLINE       0     0     0
          nvme1n1p3  ONLINE       0     0    24

errors: Permanent errors have been detected in the following files:

        amber_rpool/crypt/ROOT/gentoo/distfiles:<0x0>

With read-only access i was able to salvage all my data and migrate the system to a different device. However the original device containing the crashed pool seems to be in perfect health. I have successfully copied the entire device with no errors, also i did not observe any block device error messages.

block device test

``` # dd if=/dev/disk/by-id/nvme-KXG6AZNV1T02_TOSHIBA_31DF31IHFDS3 bs=128M status=progress | gzip | ssh n@d dd bs=128M of=downloads/toshiba-dd.gz 1024209543168 bytes (1.0 TB, 954 GiB) copied, 12904.8 s, 79.4 MB/s 0+7765443 records in 0+7765443 records out 255021655697 bytes (255 GB, 238 GiB) copied, 12904.7 s, 19.8 MB/s # md5sum /dev/disk/by-id/nvme-KXG6AZNV1T02_TOSHIBA_31DF31IHFDS3 255183a6436e6607ebe6707e41a213ee /dev/disk/by-id/nvme-KXG6AZNV1T02_TOSHIBA_31DF31IHFDS3 # ssh n@d "zcat downloads/toshiba-dd.gz | md5sum" 255183a6436e6607ebe6707e41a213ee - ```

I still have the disk and the crashed pool on it available for tests. I already salvaged all data but this seems like a perfect test bed to hunt a possible bug? To the best of my knowledge the initrd image doesn't attempt mounting the pool like in https://github.com/NixOS/nixpkgs/issues/106093 and i don't think i have any scripts that would modify the pool outside the intended resume sequence. What would be the steps to identify the specific cause of this failure?

rincebrain commented 2 years ago

Speaking only for myself, my advice was, is, and remains don't hibernate with ZFS. There's, AFAIK, nothing in the test suite that tests that it works, and it's very easy for the resume process to do the wrong thing and eat your data.

Other people disagree with me, but I've had too many people reporting that their data got eaten after hibernation, and as nobody seems to be actively working on making it less of a footgun, I just advise people not to play with fire for now.

cf. #12842, this comment.

a14a commented 2 years ago

I definitely agree with the recommendation, but despite the brittleness this is already "sort of" supported by distributions, I guess the best we can do is document the possible scenarios. As for me personally, S3 state overheats the laptop's RAM in my backpack and i can afford risking data loss so i'll keep trying.

Resume is done by Gentoo's standard genkernel initrd built via genkernel initramfs --zfs, i don't expect initrd to be causing it.

This turned out false, genkernel's initrd does import my root pool before resume in https://gitweb.gentoo.org/proj/genkernel.git/tree/defaults/linuxrc#n657 , i.e. it runs start_volumes before do_resume, exactly as in https://bugs.gentoo.org/827281 . I reversed that order in my initrd and it seems to be working so far.

If i were to run two drives in a mirror configuration for the root pool, would that make things even worse for hibernation, would the extra device make the pool more likely to fail upon resume, or not? (obviously raid1 is preferential in many cases but what about the specific resume case?)

bghira commented 2 years ago

if it is supported by your distribution, you should open a downstream ticket with them to have it resolved. because the OpenZFS project does not yet support that use case.