openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.53k stars 1.74k forks source link

Panic on import: VERIFY3(c < (1ULL << 24) >> 9) failed (36028797018963967 < 32768) #7603

Open mdunlap opened 6 years ago

mdunlap commented 6 years ago

System information

Type Version/Name
Distribution Name Was originally FreeNAS, but now testing on Ubuntu
Distribution Version 16.04 Live
Linux Kernel Linux ubuntu 4.13.0-36-generic
Architecture Intel x64
ZFS Version 0.6.5.6-0ubuntu21
SPL Version 0.6.5.6-0ubuntu21

Describe the problem you're observing

Kernel panic on import of zfs pool

Describe how to reproduce the problem

Importing the pool in anyway gives a kernel panic. I pulled these drives from a FreeNAS 11 box to rule out a hardware issue, one day it just stopped working and the logs showed this kernel panic on import.

Include any warning/errors/backtraces from the system logs

root@ubuntu:~# modprobe zfs zfs_rescue=1
root@ubuntu:~# zpool import -o readonly=on -N Storage
cannot import 'Storage': pool may be in use from other system, it was last accessed by freenas.local (hostid: 0x10387101) on Tue May 29 23:34:32 2018
use '-f' to import anyway
root@ubuntu:~# zpool import -fo readonly=on -N Storage

and then in dmesg:

[140227.665512] VERIFY3(c < (1ULL << 24) >> 9) failed (36028797018963967 < 32768)
[140227.665515] PANIC at zio.c:230:zio_buf_alloc()
[140227.665516] Showing stack for process 30009
[140227.665518] CPU: 6 PID: 30009 Comm: zpool Tainted: P           O    4.13.0-36-generic #40~16.04.1-Ubuntu
[140227.665519] Hardware name: MSI MS-7885/X99A RAIDER (MS-7885), BIOS P.40 05/25/2016
[140227.665519] Call Trace:
[140227.665525]  dump_stack+0x63/0x8b
[140227.665533]  spl_dumpstack+0x42/0x50 [spl]
[140227.665535]  spl_panic+0xbe/0x100 [spl]
[140227.665572]  ? vdev_queue_io+0x188/0x200 [zfs]
[140227.665573]  ? mutex_lock+0x12/0x40
[140227.665590]  ? zio_wait_for_children+0x82/0xa0 [zfs]
[140227.665605]  ? zio_vdev_io_done+0x2e/0x180 [zfs]
[140227.665607]  ? kmem_cache_alloc+0x15c/0x1b0
[140227.665610]  ? spl_kmem_cache_alloc+0x72/0x7c0 [spl]
[140227.665621]  ? arc_space_consume+0x55/0x70 [zfs]
[140227.665631]  ? buf_cons+0x6a/0x70 [zfs]
[140227.665633]  ? spl_kmem_cache_alloc+0x126/0x7c0 [spl]
[140227.665648]  zio_buf_alloc+0x43/0x60 [zfs]
[140227.665658]  arc_get_data_buf.isra.22+0x26c/0x390 [zfs]
[140227.665668]  arc_buf_alloc+0x10b/0x170 [zfs]
[140227.665677]  arc_read+0x1b2/0xa10 [zfs]
[140227.665690]  traverse_prefetch_metadata+0xbc/0xd0 [zfs]
[140227.665701]  traverse_visitbp+0x566/0x980 [zfs]
[140227.665713]  traverse_visitbp+0x5cf/0x980 [zfs]
[140227.665724]  traverse_visitbp+0x5cf/0x980 [zfs]
[140227.665735]  traverse_dnode+0x7f/0xe0 [zfs]
[140227.665745]  traverse_visitbp+0x712/0x980 [zfs]
[140227.665756]  traverse_visitbp+0x5cf/0x980 [zfs]
[140227.665767]  traverse_visitbp+0x5cf/0x980 [zfs]
[140227.665778]  traverse_visitbp+0x5cf/0x980 [zfs]
[140227.665789]  traverse_visitbp+0x5cf/0x980 [zfs]
[140227.665799]  traverse_visitbp+0x5cf/0x980 [zfs]
[140227.665810]  traverse_dnode+0x7f/0xe0 [zfs]
[140227.665821]  traverse_visitbp+0x888/0x980 [zfs]
[140227.665831]  traverse_impl+0x1ab/0x410 [zfs]
[140227.665847]  ? spa_async_suspend+0x90/0x90 [zfs]
[140227.665857]  traverse_dataset+0x45/0x50 [zfs]
[140227.665872]  ? spa_async_suspend+0x90/0x90 [zfs]
[140227.665883]  traverse_pool+0x178/0x1b0 [zfs]
[140227.665897]  ? spa_async_suspend+0x90/0x90 [zfs]
[140227.665912]  spa_load+0x19f9/0x1cb0 [zfs]
[140227.665914]  ? taskq_create+0x1fd/0x340 [spl]
[140227.665929]  spa_load_best+0x5d/0x280 [zfs]
[140227.665932]  ? zpool_get_rewind_policy+0x18c/0x1a0 [zcommon]
[140227.665946]  spa_import+0x1d0/0x720 [zfs]
[140227.665951]  ? nvpair_value_common.part.11+0xc7/0x120 [znvpair]
[140227.665968]  zfs_ioc_pool_import+0xfa/0x130 [zfs]
[140227.665983]  zfsdev_ioctl+0x45d/0x4f0 [zfs]
[140227.665986]  do_vfs_ioctl+0xa4/0x600
[140227.665987]  ? __do_page_fault+0x283/0x4f0
[140227.665989]  SyS_ioctl+0x79/0x90
[140227.665990]  entry_SYSCALL_64_fastpath+0x24/0xab
[140227.665992] RIP: 0033:0x7f4af13f4f47
[140227.665992] RSP: 002b:00007ffde625ca78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[140227.665993] RAX: ffffffffffffffda RBX: 00007ffde62600b0 RCX: 00007f4af13f4f47
[140227.665994] RDX: 00007ffde62600b0 RSI: 0000000000005a02 RDI: 0000000000000003
[140227.665994] RBP: 000000000096d060 R08: 000000000096d060 R09: 0000000000a791f8
[140227.665995] R10: 000000000000019d R11: 0000000000000246 R12: 000000000096fd80
[140227.665995] R13: 00000000009765f8 R14: 0000000000000000 R15: 0000000000000000

My issue is similar to https://github.com/zfsonlinux/zfs/issues/6414

I'm considering running this script to remove transactions https://gist.github.com/jshoward/5685757

But I'm not that familiar with ZFS and I don't want to break things further.

Any guidance would be appreciated.

fling- commented 6 years ago

@mdunlap do your backups first! You could create qemu qcow devices from your drives to keep them readonly. Then use zfs in qemu to test which transaction to rollback to with the script. You can shut down qemu and recreate qcow from your drives until you find the proper transaction to use.

mdunlap commented 6 years ago

@fling- thanks! I’ll have to read up on how to use qcow. I need to also make a ddrescue of each drive as a failsafe.

I don’t think I quite understand how this script is used. It needs to be run for each drive individually?

So like:

zfs_revert.py -bs=512 -tb=107374182400 /dev/sda

fling- commented 6 years ago

@mdunlap how to qemu qcow:

# creating cow mappings
tmpdir=/tmp/science
mkdir -p $tmpdir && chmod 700 $tmpdir
for leg in a b c ; do
        ro=/dev/mapper/tmp_$leg
        science=$tmpdir/science_$leg.qcow2
        rm -v $science
        qemu-img create -f qcow2 -o backing_file=$ro $science
done

# running qemu
image="/boot/vmlinuz-4.12.4-recovery-zfs-af0f842"
ramdisk="/boot/initramfs-genkernel-x86_64-4.12.4-recovery-zfs-af0f842"
args="real_root=LABEL=recovery"
root=/dev/md7

qemu-system-x86_64 -cpu host -m 8G -smp 5 -machine accel=kvm -curses \
        -hda $root \
        -kernel "$image" -initrd "$ramdisk" -append "$args" \
        -drive file=$tmpdir/science_a.qcow2,if=none,id=a \
        -device ich9-ahci,id=ahci-a \
        -device ide-drive,drive=a,bus=ahci-a.0 \
        -drive file=$tmpdir/science_b.qcow2,if=none,id=b \
        -device ich9-ahci,id=ahci-b \
        -device ide-drive,drive=b,bus=ahci-b.0 \
        -drive file=$tmpdir/science_c.qcow2,if=none,id=c \
        -device ich9-ahci,id=ahci-c \
        -device ide-drive,drive=c,bus=ahci-c.0

Restart this script until you successfully import after finding the proper txg in qemu.

@mdunlap ddrescue is good. Yes the script should be used for each drive inside of qemu. Use zdb to dump all the labels and verbose history then grep these for the txg numbers to use with the script. It might also be a good idea to try importing with illumos/freebsd in qemu. The script just works. It allows to revert when import -FX leaks https://github.com/zfsonlinux/zfs/issues/5389

mdunlap commented 6 years ago

This script worked flawlessly, I made a dd copy of all of my drives. ran this script on each of the drives https://gist.github.com/jshoward/5685757 and reverted back to an old txg.

Thanks!

Ryushin commented 5 years ago

I too am experiencing this problem when trying to import a pool.

I've tried importing with -F, -FX and it panics each time.

Tried rolling back with -T to a previous txg=41510866 and it panics. Trying to roll back anything before 41510866 does not work and gives me a error: cannot import 'rpool': one or more devices is currently unavailable

Panic from /var/log/messages when using sysresccd: Apr 3 18:48:48 sysresccd kernel: VERIFY3(c < (1ULL << 24) >> 9) failed (36028797018963967 < 32768) Apr 3 18:48:48 sysresccd kernel: PANIC at zio.c:266:zio_buf_alloc() Apr 3 18:48:48 sysresccd kernel: Showing stack for process 8921 Apr 3 18:48:48 sysresccd kernel: CPU: 6 PID: 8921 Comm: zpool Tainted: P O 4.14.15-std520-amd64 #1 Apr 3 18:48:48 sysresccd kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.1 06/06/2018 Apr 3 18:48:48 sysresccd kernel: Call Trace: Apr 3 18:48:48 sysresccd kernel: dump_stack+0x5a/0x6f Apr 3 18:48:48 sysresccd kernel: spl_panic+0xd5/0x109 [spl] Apr 3 18:48:48 sysresccd kernel: ? vdev_cache_stat_fini+0xbb9/0xfaf [zfs] Apr 3 18:48:48 sysresccd kernel: ? mutex_unlock+0x18/0x2a Apr 3 18:48:48 sysresccd kernel: ? slab_pre_alloc_hook+0x17/0x1c Apr 3 18:48:48 sysresccd kernel: ? kmem_cache_alloc+0x7e/0x11f Apr 3 18:48:48 sysresccd kernel: ? spl_kmem_cache_alloc+0x35/0x9a3 [spl] Apr 3 18:48:48 sysresccd kernel: ? spl_kmem_cache_alloc+0x35/0x9a3 [spl] Apr 3 18:48:48 sysresccd kernel: ? zio_vdev_io_bypass+0x25d/0x26d [zfs] Apr 3 18:48:48 sysresccd kernel: ? mutex_unlock+0x18/0x2a Apr 3 18:48:48 sysresccd kernel: zio_buf_alloc+0x3c/0x4f [zfs] Apr 3 18:48:48 sysresccd kernel: abd_alloc_linear+0x73/0x97 [zfs] Apr 3 18:48:48 sysresccd kernel: abd_alloc+0x33/0x40a [zfs] Apr 3 18:48:48 sysresccd kernel: ? spl_kmem_cache_alloc+0x35/0x9a3 [spl] Apr 3 18:48:48 sysresccd kernel: ? cv_init+0x38/0x50 [spl] Apr 3 18:48:48 sysresccd kernel: ? arc_space_consume+0x5e3/0x5e7 [zfs] Apr 3 18:48:48 sysresccd kernel: ? spl_kmem_cache_alloc+0x58e/0x9a3 [spl] Apr 3 18:48:48 sysresccd kernel: ? arc_buf_size+0x499/0x740 [zfs] Apr 3 18:48:48 sysresccd kernel: arc_space_consume+0x3c3/0x5e7 [zfs] Apr 3 18:48:48 sysresccd kernel: arc_space_consume+0x4db/0x5e7 [zfs] Apr 3 18:48:48 sysresccd kernel: arc_read+0x3e2/0x26fa [zfs] Apr 3 18:48:48 sysresccd kernel: dmu_objset_is_receiving+0xb1/0x10fe [zfs] Apr 3 18:48:48 sysresccd kernel: dmu_objset_is_receiving+0x686/0x10fe [zfs] Apr 3 18:48:48 sysresccd kernel: dmu_objset_is_receiving+0xf19/0x10fe [zfs] Apr 3 18:48:48 sysresccd kernel: dmu_objset_is_receiving+0x83b/0x10fe [zfs] Apr 3 18:48:48 sysresccd kernel: dmu_objset_is_receiving+0x6ef/0x10fe [zfs] Apr 3 18:48:48 sysresccd kernel: dmu_objset_is_receiving+0x6ef/0x10fe [zfs] Apr 3 18:48:48 sysresccd kernel: dmu_objset_is_receiving+0x6ef/0x10fe [zfs] Apr 3 18:48:48 sysresccd kernel: dmu_objset_is_receiving+0x6ef/0x10fe [zfs] Apr 3 18:48:48 sysresccd kernel: dmu_objset_is_receiving+0x6ef/0x10fe [zfs] Apr 3 18:48:48 sysresccd kernel: dmu_objset_is_receiving+0x6ef/0x10fe [zfs] Apr 3 18:48:48 sysresccd kernel: dmu_objset_is_receiving+0xf19/0x10fe [zfs] Apr 3 18:48:48 sysresccd kernel: dmu_objset_is_receiving+0x924/0x10fe [zfs] Apr 3 18:48:48 sysresccd kernel: ? mutex_unlock+0x18/0x2a Apr 3 18:48:48 sysresccd kernel: dmu_objset_is_receiving+0xca0/0x10fe [zfs] Apr 3 18:48:48 sysresccd kernel: ? spa_prop_get+0xafe/0x1071 [zfs] Apr 3 18:48:48 sysresccd kernel: traverse_dataset_resume+0x3d/0x42 [zfs] Apr 3 18:48:48 sysresccd kernel: ? spa_prop_get+0xafe/0x1071 [zfs] Apr 3 18:48:48 sysresccd kernel: traverse_pool+0x108/0x157 [zfs] Apr 3 18:48:48 sysresccd kernel: spa_lookup_by_guid+0x1c65/0x25c6 [zfs] Apr 3 18:48:48 sysresccd kernel: ? taskq_create+0x300/0x312 [spl] Apr 3 18:48:48 sysresccd kernel: spa_lookup_by_guid+0x2084/0x25c6 [zfs] Apr 3 18:48:48 sysresccd kernel: ? zpool_get_rewind_policy+0x90/0x138 [zcommon] Apr 3 18:48:48 sysresccd kernel: spa_import+0x18e/0x61b [zfs] Apr 3 18:48:48 sysresccd kernel: ? zfs_secpolicy_smb_acl+0x47b/0x5400 [zfs] Apr 3 18:48:48 sysresccd kernel: zfs_secpolicy_smb_acl+0x1118/0x5400 [zfs] Apr 3 18:48:48 sysresccd kernel: pool_status_check+0x470/0x543 [zfs] Apr 3 18:48:48 sysresccd kernel: vfs_ioctl+0x16/0x23 Apr 3 18:48:48 sysresccd kernel: do_vfs_ioctl+0x47e/0x52b Apr 3 18:48:48 sysresccd kernel: ? do_page_fault+0x3e2/0x45b Apr 3 18:48:48 sysresccd kernel: SyS_ioctl+0x4e/0x71 Apr 3 18:48:48 sysresccd kernel: entry_SYSCALL_64_fastpath+0x24/0x87 Apr 3 18:48:48 sysresccd kernel: RIP: 0033:0x7fcb26b7dc77

Looking that zfs_revert.py script now. But that is my last resort.

zdb_e_rpool.txt zdb_txg.list.zip

mdunlap commented 5 years ago

Have you tried the zfsrevert python script I mentioned in my last post?

mdunlap commented 5 years ago

Oh I didn’t read your comment fully

Ryushin commented 5 years ago

Have you tried the zfsrevert python script I mentioned in my last post?

Do you happen to remember the steps you used for that script? It is very scary looking.

mdunlap commented 5 years ago

It certainly is a hammer-like approach. I made a dd copy of my drives just Incase something went wrong. But none of the other tools were working because as soon as they tried to run and revert to an old txg they would fail much like you are experiencing. So this is essentially the manual way. Here’s what I did:

Found disk block size: blockdev --getsize64 /dev/sda

so like:

zfs_revert.py -bs=512 -tb=107374182400 /dev/sda

Ran this for each disk, and then mounted zfs pool as read only. And audibly shouted “YES!”

Ryushin commented 5 years ago

Thanks so much. I guess it looks easier to use then I anticipated. Still scary looking stuff though.

Might be trying that if the test build behlendorf is working on from this code https://github.com/zfsonlinux/zfs/pull/8099 does not work.

Ryushin commented 5 years ago

Working with Behlendorf we were able to restore the pool to a working state using these patches:

+All the ones after the 0.7.9 tag, https://github.com/dweeezil/zfs/commits/zfs-0.7.9-zdb +657a4cb Allow to specify zdb -R lsize/compression +35c52ac Unconditionally debugging in receive_process_record() +c2b9d5e Re-institute the zero-size tests in arc_read() +36b78e3 Back off on the embedded blkptr check +089be80 Additional hardening against damaged blkptrs +7bbfb66 Avoid panic/abort when printing corrupted blkptrs

Behlendorf is hoping these can get added to mainline after they have been worked on a bit.

Ryushin commented 5 years ago

I encountered this bug again after another crash. denverThis time I had the patches against the sysresccd source.

I'm including a bit of a how-to if anyone else encounters this problem before the fixes can be committed to mainline and a rescue disc comes with them.

Grab the sysresccd iso that feared bliss made: https://xyinn.org/gentoo/livecd/sysresccd-5.2.0_zfs_0.7.9.iso

Grab the spl and zfs binaries with the patches needed that Behlendorf compiled in the deb chroot build environment that I built to match was was used in the sysresccd: https://cloud.chrisdos.com/index.php/s/qfoFn3Jdd3kM5KC

The chroot build environment is there if you want it. It contains the sources for the sysresccd kernel and zfs with the patches along with what is necessary to actually compile the sources.

Boot the sysresccd and get it on the network or you could just copy the files to a usb stick.

Following these instructions but make changes based on your pool name, etcetera:

-- The build was made in /var/temp inside the chroot. Need to symlink /tmp to /var/temp ln -s /tmp /var/temp

-- Copy the spl_zfs.tar.gz to /tmp and extract it tar -xJvf spl_zfs.tar.xz cd /tmp/zfs

-- Load the kmods from the source trees ./scripts/zfs.sh

-- Echo the paramater to recover echo 1 >/sys/module/zfs/parameters/zfs_recover -- Import pool read only ./cmd/zpool/zpool import -f -d /dev/disk/by-id -R /mnt/rpool -o readonly=on rpool

Output from command: cannot import 'rpool': I/O error Recovery is possible, but will result in some data loss. Returning the pool to its state as of Sun Apr 7 14:06:51 2019 should correct the problem. Approximately 10 seconds of data must be discarded, irreversibly. After rewind, at least one persistent user-data error will remain. Recovery can be attempted by executing 'zpool import -F rpool'. A scrub of the pool is strongly recommended after recovery.

-- Force import pool read only ./cmd/zpool/zpool import -F -f -d /dev/disk/by-id -R /mnt/rpool -o readonly=on rpool

-- Backup the data off the pool just to be safe

-- Export the pool and import read write ./cmd/zpool/zpool export rpool ./cmd/zpool/zpool import -F -f -d /dev/disk/by-id -R /mnt/rpool rpool

-- Scrub the pool using the loaded modules. ./cmd/zpool/zpool scrub rpool

-- When scrub finishes export the rpool and you're done ./cmd/zpool/zpool export rpool

Ryushin commented 5 years ago

So for me to hit this bug twice in under a week, I think something else is causing this. This system was rock solid stable until I made it start exporting a 100TB ZVOL as a Fibre Channel target using LIO (target-cli). This is running on Devuan Ascii (Debian Stretch) using the Debian Stretch ZFS 0.7.12-1 in backports along with the backported kernel 4.19.0-0.bpo.2-amd64.

I added a new 44 slot Supermicro JBOD chassis hooked up to a LSI 9400-8e card with 22 10TB drives. I'm xxporting the ZFS ZVOL using a QLogic 2562 Fibre Channel card. A sync was writing about 550MB/s to the ZVOl when the crash happened.

I send the logs to another server but nothing was sent as I think the panic prevents writing anything to the log. I'll google to see if it is possible to write the logs/core to a USB stick.

ZFS options in /etc/modprobe.d/zfs.conf: options zfs zfs_arc_max=34359738368 options zfs zvol_threads=8

CPU is: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

Main chassis is a 36drive Super Micro with a 9300-8i card hooked to both backplanes. This server has been rock solid with this hardware.

mdunlap: Where you doing anything like what I'm doing when you encountered this bug?

Ryushin commented 5 years ago

So it seems that a scrub cannot repair the pool at this point. When a scrub ran under the unpatched code it panic'd again and I had to go through the same steps above to allow the pool to import again.

Running the scrub under the patched code will log the errors you will need to repair the pool through. After running the scrub, I have a bunch of errors in /var/log/messages that look like this:

Apr 12 19:33:50 sysresccd kernel: arc_read: damaged bp lsize=0 psize=0 zb=0xb7a0/0x490/0x1/0x182 comp=59 emb=1 Apr 12 19:33:50 sysresccd kernel: dsl_scan_visitbp: scan error visiting ds=47008 zb=0xb7a0/0x490/0x2/0x0 Apr 12 19:35:34 sysresccd kernel: arc_read: damaged bp lsize=0 psize=0 zb=0xb957/0x490/0x1/0x182 comp=59 emb=1 Apr 12 19:35:34 sysresccd kernel: dsl_scan_visitbp: scan error visiting ds=47447 zb=0xb957/0x490/0x2/0x0

The lines we need are the "dsl_scan_visitbp: scan error visiting" lines. The ds= is the dataset that is affected. Behlendorf posted the log values are the dataset-id/object/level/blkid.

So it will be necessary to find the datasets and snapshots that are affected. The snapshots that are affected will need to be deleted and the files in question will need to be copied and the originals deleted. The deleting and copying must use the patched code.

I have a VM created that uses the images of the broken pool. When new patches come out I can test to see if they can repair the pool using a scrub.

Ryushin commented 4 years ago

I still have the raw disk files for this issue. Is there a location that you would like these uploaded in order to fix the code and have scrub repair the damaged pool. I'm worried about this bug as I have another pool that has a single 165TB file that acts as a Fibre Channel LUN Target that cannot be lost and we don't have a storage area large enough to back this up. The recovery above required copying the files which would not be possible in this case. Since I'm about to move to 0.8.2 my recovery tools will no longer work. Can these patches be added soon to 0.8.x?

Ryushin commented 4 years ago

@Ryushin dear lord who paid for someone to set that system up in such a way?

They need a VERY large LUN to work with a software called Tiger Store that allows sharing a block volume. I tried a ZVOL and performance was abysmal. A flat sparce file and using TargetCli-FB allows it to work. By all means I'm open to other suggestions on providing a huge LUN over Fibre Channel. There are very few options available.

Ryushin commented 4 years ago

i was testing zvol performance using EFA on EC2 this week and hit more than 70gbps on a single zvol with 8 NVMe devices in a mirrored stripe, what are your test patterns like that got you abysmal performance? what version did you test? I am on 0.8.2, using 16k zvol with 16 to 64k mixed IOs.

We are running on spinning platters. Ran into the Read Modify Write (RMW) issues for ZVOLs and there are a couple of bugs for that in Git for that. As soon as we switched to a sparse file, things got much better and we could deploy this to production. We where on 0.7.9 I believe.

Ryushin commented 4 years ago

can you test on zvol a newer release for performance?

Won't be very doable right now for me. We have something working and we don't have enough space to change our storage method. Perhaps someone in those other issue threads could do some testing: https://github.com/zfsonlinux/zfs/issues/8472 https://github.com/zfsonlinux/zfs/issues/8590

stale[bot] commented 3 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

Ryushin commented 3 years ago

Have these patches been committed yet 2.0.0? I still have the disk images of the pool where this has occurred.

behlendorf commented 3 years ago

@Ryushin it's worth a try. Updated variants of those changes have been applied to 2.0, if there are still some bits missing it would be good to determine that so they can be merged.

Ryushin commented 3 years ago

@Ryushin it's worth a try. Updated variants of those changes have been applied to 2.0, if there are still some bits missing it would be good to determine that so they can be merged.

Booted up sysresccd disc with zfs 2.0 in it. I was able to import the pool just fine. Running a scrub through results in the same error:

Mar 05 22:40:12 sysrescue kernel: VERIFY3(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT) failed (36028797018963967 < 32768)
Mar 05 22:40:12 sysrescue kernel: PANIC at zio.c:295:zio_buf_alloc()
Mar 05 22:40:12 sysrescue kernel: Showing stack for process 1023
Mar 05 22:40:12 sysrescue kernel: CPU: 4 PID: 1023 Comm: dp_sync_taskq Tainted: P           OE     5.4.83-1-lts #1
Mar 05 22:40:12 sysrescue kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Mar 05 22:40:12 sysrescue kernel: Call Trace:
Mar 05 22:40:12 sysrescue kernel:  dump_stack+0x64/0x88
Mar 05 22:40:12 sysrescue kernel:  spl_panic+0xef/0x117 [spl]
Mar 05 22:40:12 sysrescue kernel:  ? hdr_full_cons+0xef/0x100 [zfs]
Mar 05 22:40:12 sysrescue kernel:  ? aggsum_add+0x171/0x190 [zfs]
Mar 05 22:40:12 sysrescue kernel:  zio_buf_alloc+0x55/0x60 [zfs]
Mar 05 22:40:12 sysrescue kernel:  abd_alloc_linear+0x54/0xc0 [zfs]
Mar 05 22:40:12 sysrescue kernel:  arc_hdr_alloc_abd+0x13f/0x1e0 [zfs]
Mar 05 22:40:12 sysrescue kernel:  arc_hdr_alloc+0x104/0x170 [zfs]
Mar 05 22:40:12 sysrescue kernel:  arc_read+0x5a4/0x1220 [zfs]
Mar 05 22:40:12 sysrescue kernel:  ? dsl_scan_prefetch_dnode+0x180/0x180 [zfs]
Mar 05 22:40:12 sysrescue kernel:  dsl_scan_prefetch_thread+0x235/0x2c0 [zfs]
Mar 05 22:40:12 sysrescue kernel:  taskq_thread+0x28e/0x4b0 [spl]
Mar 05 22:40:12 sysrescue kernel:  ? wake_up_q+0x60/0x60
Mar 05 22:40:12 sysrescue kernel:  ? taskq_thread_spawn+0x50/0x50 [spl]
Mar 05 22:40:12 sysrescue kernel:  kthread+0x117/0x130
Mar 05 22:40:12 sysrescue kernel:  ? __kthread_bind_mask+0x60/0x60
Mar 05 22:40:12 sysrescue kernel:  ret_from_fork+0x35/0x40
Mar 05 22:42:44 sysrescue kernel: INFO: task dp_sync_taskq:1023 blocked for more than 122 seconds.
Mar 05 22:42:44 sysrescue kernel:       Tainted: P           OE     5.4.83-1-lts #1
Mar 05 22:42:44 sysrescue kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 05 22:42:44 sysrescue kernel: dp_sync_taskq   D    0  1023      2 0x80004080
Mar 05 22:42:44 sysrescue kernel: Call Trace:
Mar 05 22:42:44 sysrescue kernel:  __schedule+0x2a6/0x7d0
Mar 05 22:42:44 sysrescue kernel:  schedule+0x39/0xa0
Mar 05 22:42:44 sysrescue kernel:  spl_panic+0x115/0x117 [spl]
Mar 05 22:42:44 sysrescue kernel:  ? hdr_full_cons+0xef/0x100 [zfs]
Mar 05 22:42:44 sysrescue kernel:  ? aggsum_add+0x171/0x190 [zfs]
Mar 05 22:42:44 sysrescue kernel:  zio_buf_alloc+0x55/0x60 [zfs]
Mar 05 22:42:44 sysrescue kernel:  abd_alloc_linear+0x54/0xc0 [zfs]
Mar 05 22:42:44 sysrescue kernel:  arc_hdr_alloc_abd+0x13f/0x1e0 [zfs]
Mar 05 22:42:44 sysrescue kernel:  arc_hdr_alloc+0x104/0x170 [zfs]
Mar 05 22:42:44 sysrescue kernel:  arc_read+0x5a4/0x1220 [zfs]
Mar 05 22:42:44 sysrescue kernel:  ? dsl_scan_prefetch_dnode+0x180/0x180 [zfs]
Mar 05 22:42:44 sysrescue kernel:  dsl_scan_prefetch_thread+0x235/0x2c0 [zfs]
Mar 05 22:42:44 sysrescue kernel:  taskq_thread+0x28e/0x4b0 [spl]
Mar 05 22:42:44 sysrescue kernel:  ? wake_up_q+0x60/0x60
Mar 05 22:42:44 sysrescue kernel:  ? taskq_thread_spawn+0x50/0x50 [spl]
Mar 05 22:42:44 sysrescue kernel:  kthread+0x117/0x130
Mar 05 22:42:44 sysrescue kernel:  ? __kthread_bind_mask+0x60/0x60
Mar 05 22:42:44 sysrescue kernel:  ret_from_fork+0x35/0x40
Mar 05 22:42:44 sysrescue kernel: INFO: task txg_sync:1146 blocked for more than 122 seconds.
Mar 05 22:42:44 sysrescue kernel:       Tainted: P           OE     5.4.83-1-lts #1
Mar 05 22:42:44 sysrescue kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 05 22:42:44 sysrescue kernel: txg_sync        D    0  1146      2 0x80004080

So I cannot repair the file system with a scrub. I can probably go through the last procedure and delete the offending snapshot that has the error in it. It would probably be best though to have scrub to be able to repair this.

Ryushin commented 3 years ago

@behlendorf I can set up a VM with OpenZFS head on it so I can test patches if you like to try and fix this.

Ryushin commented 3 years ago

Can anyone look into this. Are there any patches to head I should try to see if a scrub can repair this error?

Ryushin commented 3 years ago

@behlendorf, are there any patches I should try that can allow a successful scrub that repairs this pool?

Ryushin commented 3 years ago

I'll test 2.1 on this in a few weeks.

Ryushin commented 2 years ago

Would a Dev like me to upload the images somewhere so they can work on debugging the problem. The four disk images work out to be 1.1 TiB in size.

behlendorf commented 2 years ago

How did it go with the 2.1 release? I believe it should have prevented this https://github.com/openzfs/zfs/issues/7603#issuecomment-791795399 crash.

Ryushin commented 2 years ago

Spun up a KVM running Devuan and backported ZFS v2.1.4. and having the root filesystem be ext4 instead of a root pool. Attached the raw images and proceeded to run a scrub. I did set the mountpoints to none for all the datasets in the pool. The pool hung 61% into the scrub but the system did not panic since I believe I had nothing mounted. Trying to mount one of the datasets just results in it never completing.

I'll reset the system and run it again with one of the datasets mounted.

root@zfs-panic:/mnt# zpool status -v
  pool: rpool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
    The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: scrub in progress since Mon May 16 17:27:06 2022
    711G scanned at 18.1M/s, 457G issued at 11.6M/s, 741G total
    0B repaired, 61.76% done, 06:55:57 to go
config:

    NAME                                  STATE     READ WRITE CKSUM
    rpool                                 ONLINE       0     0     0
      raidz2-0                            ONLINE       0     0     0
        scsi-0QEMU_QEMU_HARDDISK_rpool_2  ONLINE       0     0     0
        scsi-0QEMU_QEMU_HARDDISK_rpool_3  ONLINE       0     0     0
        scsi-0QEMU_QEMU_HARDDISK_rpool_4  ONLINE       0     0     0
        scsi-0QEMU_QEMU_HARDDISK_rpool_1  ONLINE       0     0     0

errors: No known data errors

From /var/log/kern.log:

May 16 18:52:06 zfs-panic kernel: [ 5981.768613] BUG: unable to handle page fault for address: 00000000000032b8
May 16 18:52:06 zfs-panic kernel: [ 5981.769942] #PF: supervisor write access in kernel mode
May 16 18:52:06 zfs-panic kernel: [ 5981.770721] #PF: error_code(0x0002) - not-present page
May 16 18:52:06 zfs-panic kernel: [ 5981.771478] PGD 0 P4D 0 
May 16 18:52:06 zfs-panic kernel: [ 5981.772238] Oops: 0002 [#1] SMP PTI
May 16 18:52:06 zfs-panic kernel: [ 5981.772991] CPU: 0 PID: 9928 Comm: txg_sync Tainted: P           OE     5.10.0-14-amd64 #1 Debian 5.10.113-1
May 16 18:52:06 zfs-panic kernel: [ 5981.773793] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
May 16 18:52:06 zfs-panic kernel: [ 5981.774614] RIP: 0010:mutex_lock+0x19/0x30
May 16 18:52:06 zfs-panic kernel: [ 5981.775410] Code: 00 0f 1f 44 00 00 be 02 00 00 00 e9 11 fb ff ff 90 0f 1f 44 00 00 55 48 89 fd e8 72 e6 ff ff 31 c0 65 48 8b 14 25 c0 bb 01 00 <f0> 48 0f b1 55 00 75 02 5d c3 48 89 ef 5d eb c7 0f 1f 80 00 00 00
May 16 18:52:06 zfs-panic kernel: [ 5981.777074] RSP: 0018:ffffa4260478f3e0 EFLAGS: 00010246
May 16 18:52:06 zfs-panic kernel: [ 5981.777928] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
May 16 18:52:06 zfs-panic kernel: [ 5981.778802] RDX: ffff963ebd99c740 RSI: 0000000000490048 RDI: 00000000000032b8
May 16 18:52:06 zfs-panic kernel: [ 5981.779661] RBP: 00000000000032b8 R08: 0000000001a4f600 R09: 0000000002834400
May 16 18:52:06 zfs-panic kernel: [ 5981.780517] R10: 0000000002834400 R11: 000000000000fc00 R12: ffffa42600789500
May 16 18:52:06 zfs-panic kernel: [ 5981.781362] R13: 0000000000000000 R14: 00000001fffffe00 R15: ffffa4260478f570
May 16 18:52:06 zfs-panic kernel: [ 5981.782196] FS:  0000000000000000(0000) GS:ffff963fb7c00000(0000) knlGS:0000000000000000
May 16 18:52:06 zfs-panic kernel: [ 5981.783049] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 16 18:52:06 zfs-panic kernel: [ 5981.783893] CR2: 00000000000032b8 CR3: 00000001014ae001 CR4: 00000000001706f0
May 16 18:52:06 zfs-panic kernel: [ 5981.784737] Call Trace:
May 16 18:52:06 zfs-panic kernel: [ 5981.785652]  dsl_scan_scrub_cb+0x2cd/0x750 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.786548]  dsl_scan_visitbp.isra.0+0x6ea/0xcd0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.787413]  ? dsl_scan_visitbp.isra.0+0x3d0/0xcd0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.788231]  ? kfree+0xba/0x480
May 16 18:52:06 zfs-panic kernel: [ 5981.789070]  dsl_scan_visitbp.isra.0+0x3d0/0xcd0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.789927]  dsl_scan_visitbp.isra.0+0x62a/0xcd0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.790779]  dsl_scan_visitbp.isra.0+0x3d0/0xcd0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.791597]  dsl_scan_visitbp.isra.0+0x3d0/0xcd0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.792384]  dsl_scan_visitbp.isra.0+0x3d0/0xcd0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.793154]  dsl_scan_visitbp.isra.0+0x3d0/0xcd0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.793912]  dsl_scan_visitbp.isra.0+0x3d0/0xcd0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.794660]  dsl_scan_visitbp.isra.0+0x3d0/0xcd0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.795374]  dsl_scan_visitbp.isra.0+0x869/0xcd0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.796077]  dsl_scan_visit_rootbp.isra.0+0x11b/0x1a0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.796778]  dsl_scan_visitds+0x189/0x4a0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.797436]  ? try_to_wake_up+0x321/0x4f0
May 16 18:52:06 zfs-panic kernel: [ 5981.798088]  ? __kmalloc_node+0x141/0x2b0
May 16 18:52:06 zfs-panic kernel: [ 5981.798760]  ? spl_kmem_alloc_impl+0xae/0xf0 [spl]
May 16 18:52:06 zfs-panic kernel: [ 5981.799406]  ? spl_kmem_alloc_impl+0xae/0xf0 [spl]
May 16 18:52:06 zfs-panic kernel: [ 5981.800040]  ? tsd_hash_search.isra.0+0x42/0x90 [spl]
May 16 18:52:06 zfs-panic kernel: [ 5981.800678]  ? tsd_set+0x222/0x480 [spl]
May 16 18:52:06 zfs-panic kernel: [ 5981.801357]  dsl_scan_sync+0x84f/0x1350 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.802002]  ? kfree+0xba/0x480
May 16 18:52:06 zfs-panic kernel: [ 5981.802683]  ? bplist_iterate+0x10d/0x130 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.803364]  spa_sync+0x5b3/0xfa0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.804023]  ? mutex_lock+0xe/0x30
May 16 18:52:06 zfs-panic kernel: [ 5981.804697]  ? spa_txg_history_init_io+0x101/0x110 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.805379]  txg_sync_thread+0x2e0/0x4a0 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.806051]  ? txg_fini+0x250/0x250 [zfs]
May 16 18:52:06 zfs-panic kernel: [ 5981.806664]  thread_generic_wrapper+0x6f/0x80 [spl]
May 16 18:52:06 zfs-panic kernel: [ 5981.807274]  ? __thread_exit+0x20/0x20 [spl]
May 16 18:52:06 zfs-panic kernel: [ 5981.807872]  kthread+0x11b/0x140
May 16 18:52:06 zfs-panic kernel: [ 5981.808469]  ? __kthread_bind_mask+0x60/0x60
May 16 18:52:06 zfs-panic kernel: [ 5981.809050]  ret_from_fork+0x22/0x30
May 16 18:52:06 zfs-panic kernel: [ 5981.809623] Modules linked in: fuse btrfs blake2b_generic xor raid6_pq ufs qnx4 hfsplus hfs minix vfat msdos fat jfs xfs libcrc32c dm_mod zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) intel_rapl_msr intel_rapl_common ghash_clmulni_intel aesni_intel libaes crypto_simd cryptd glue_helper rapl bochs_drm drm_vram_helper drm_ttm_helper ttm drm_kms_helper cec joydev evdev serio_raw pcspkr virtio_rng drm qemu_fw_cfg rng_core hid_generic virtio_console virtio_balloon button usbhid hid ext4 crc16 mbcache jbd2 crc32c_generic sg sd_mod t10_pi crc_t10dif crct10dif_generic virtio_net virtio_blk net_failover virtio_scsi failover sr_mod cdrom ata_generic ata_piix crct10dif_pclmul crct10dif_common crc32_pclmul psmouse ehci_pci uhci_hcd ehci_hcd libata usbcore crc32c_intel scsi_mod virtio_pci virtio_ring virtio floppy usb_common i2c_piix4
May 16 18:52:06 zfs-panic kernel: [ 5981.814195] CR2: 00000000000032b8
May 16 18:52:06 zfs-panic kernel: [ 5981.814898] ---[ end trace 34dc2125673fcb02 ]---
May 16 18:52:06 zfs-panic kernel: [ 5981.815595] RIP: 0010:mutex_lock+0x19/0x30
May 16 18:52:06 zfs-panic kernel: [ 5981.816280] Code: 00 0f 1f 44 00 00 be 02 00 00 00 e9 11 fb ff ff 90 0f 1f 44 00 00 55 48 89 fd e8 72 e6 ff ff 31 c0 65 48 8b 14 25 c0 bb 01 00 <f0> 48 0f b1 55 00 75 02 5d c3 48 89 ef 5d eb c7 0f 1f 80 00 00 00
May 16 18:52:06 zfs-panic kernel: [ 5981.817708] RSP: 0018:ffffa4260478f3e0 EFLAGS: 00010246
May 16 18:52:06 zfs-panic kernel: [ 5981.818439] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
May 16 18:52:06 zfs-panic kernel: [ 5981.819164] RDX: ffff963ebd99c740 RSI: 0000000000490048 RDI: 00000000000032b8
May 16 18:52:06 zfs-panic kernel: [ 5981.819892] RBP: 00000000000032b8 R08: 0000000001a4f600 R09: 0000000002834400
May 16 18:52:06 zfs-panic kernel: [ 5981.820611] R10: 0000000002834400 R11: 000000000000fc00 R12: ffffa42600789500
May 16 18:52:06 zfs-panic kernel: [ 5981.821333] R13: 0000000000000000 R14: 00000001fffffe00 R15: ffffa4260478f570
May 16 18:52:06 zfs-panic kernel: [ 5981.822047] FS:  0000000000000000(0000) GS:ffff963fb7c00000(0000) knlGS:0000000000000000
May 16 18:52:06 zfs-panic kernel: [ 5981.822788] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 16 18:52:06 zfs-panic kernel: [ 5981.823533] CR2: 00000000000032b8 CR3: 00000001014ae001 CR4: 00000000001706f0
Ryushin commented 2 years ago

Running it a second time with the dataset mounted resulted in the same kernel error. Let me know if you would like the disk images so you can further debug this. I can also set up access to this VM if you like.

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

Ryushin commented 1 year ago

Waiting for the next version of ZFS to come out to see if it can repair the repair.