openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.32k stars 1.72k forks source link

ZTS: FreeBSD 13 panics on cp_stress.ksh (reproducable) #16297

Open tonyhutter opened 1 week ago

tonyhutter commented 1 week ago

System information

Type Version/Name
Distribution Name FreeBSD
Distribution Version 13.2-RELEASE-p10
Kernel Version
Architecture x86-64
OpenZFS Version master (c98295eed2687cee704ef5f8f3218d3d44a6a1d8)

Describe the problem you're observing

You can easily panic FreeBSD 13 by running the cp_stress.ksh ZTS test.

Describe how to reproduce the problem

Run the cp_stress tests on FreeBSD 13:

 ./scripts/zfs-tests.sh -x -t `pwd`/tests/zfs-tests/tests/functional/cp_files/cp_stress.ksh

Sometimes it passes, but typically it will panic after 1-3 tries. I hit it running on a VM with 4 vCPUs.

Include any warning/errors/backtraces from the system logs

panic: (link->list_next == NULL) is equivalent to (link->list_prev == NULL)
cpuid = 2
time = 1719250843
KDB: stack backtrace:
#0 0xffffffff80c53ff5 at kdb_backtrace+0x65
#1 0xffffffff80c06971 at vpanic+0x151
#2 0xffffffff824419ba at spl_panic+0x3a
#3 0xffffffff82440095 at list_link_active+0x55
#4 0xffffffff824ec3d3 at dnode_is_dirty+0x93
#5 0xffffffff824c6e87 at dmu_offset_next+0x57
#6 0xffffffff8264eb0d at zfs_holey+0x14d
#7 0xffffffff8246272f at zfs_freebsd_ioctl+0x4f
#8 0xffffffff80cf9474 at vn_ioctl+0x1a4
#9 0xffffffff80cf9dac at vn_seek+0x20c
#10 0xffffffff80cf289b at kern_lseek+0x6b
#11 0xffffffff810b289c at amd64_syscall+0x10c
#12 0xffffffff81089a8b at fast_syscall_common+0xf8
Uptime: 27m0s
Dumping 406 out of 4062 MB:..4%..12%..24%..32%..44%..52%..63%..71%..83%..91%
robn commented 1 week ago

Unable to reproduce on 13.2-RELEASE-p11, OpenZFS c98295e. Tried VM with 2x and 4x cores, and 2G and 16G RAM.

Typical run:

robn@freebsd13:~/zfs $ ./scripts/zfs-tests.sh -Dvxt cp_stress

--- Cleanup ---
Removing pool(s):
Removing loopback(s):
Removing files(s):

--- Configuration ---
Runfiles:        /var/tmp/zfs-tests.2773.run
STF_TOOLS:       /home/robn/zfs/tests/test-runner
STF_SUITE:       /home/robn/zfs/tests/zfs-tests
STF_PATH:        /home/robn/zfs/tests/zfs-tests/bin
FILEDIR:         /var/tmp
FILES:           /var/tmp/file-vdev0 /var/tmp/file-vdev1 /var/tmp/file-vdev2
LOOPBACKS:       md0 md1 md2
DISKS:           md0 md1 md2
NUM_DISKS:       3
FILESIZE:        4G
ITERATIONS:      1
TAGS:            functional
STACK_TRACER:    no
Keep pool(s):    rpool
Missing util(s): arc_summary arcstat zilstat dbufstat mount.zfs zed zgenhostid devname2devid file_fadvise getversion mmap_libaio randfree_file read_dos_attributes renameat2 user_ns_exec write_dos_attributes xattrtest zed_fd_spill-zedlet idmap_util fio net pamtester rsync

/home/robn/zfs/tests/test-runner/bin/test-runner.py  -D   -c "/var/tmp/zfs-tests.2773.run" -T "functional" -i "/home/robn/zfs/tests/zfs-tests" -I "1"
NOTE: begin default_setup_noexit
SUCCESS: zpool create -f testpool md0
SUCCESS: zfs create testpool/testfs
SUCCESS: zfs set mountpoint=/var/tmp/testdir testpool/testfs
Test: /home/robn/zfs/tests/zfs-tests/tests/functional/cp_files/setup (run as root) [00:00] [PASS]
ASSERTION: Run the 'seekflood' binary repeatedly to try to trigger #15526
SUCCESS: mkdir /testpool/cp_stress
SUCCESS: /home/robn/zfs/tests/zfs-tests/tests/functional/cp_files/seekflood 2000 4
SUCCESS: /home/robn/zfs/tests/zfs-tests/tests/functional/cp_files/seekflood 2000 4
SUCCESS: /home/robn/zfs/tests/zfs-tests/tests/functional/cp_files/seekflood 2000 4
No corruption detected
NOTE: Performing local cleanup via log_onexit (cleanup)
Test: /home/robn/zfs/tests/zfs-tests/tests/functional/cp_files/cp_stress.ksh (run as root) [00:15] [PASS]
SUCCESS: zpool destroy -f testpool
SUCCESS: rm -rf /var/tmp/testdir
Test: /home/robn/zfs/tests/zfs-tests/tests/functional/cp_files/cleanup (run as root) [00:00] [PASS]

Results Summary
PASS      3

Running Time:   00:00:15
Percent passed: 100.0%
Log directory:  /var/tmp/test_results/20240625T132626

Tests with results other than PASS that are expected:

Tests with result of PASS that are unexpected:

Tests with results other than PASS that are unexpected:

Anything interesting in your ZTS config? Specifically, I'm wondering about whether or not you're using "real" disks or the default files in /var/tmp. If the latter, is /var/tmp itself backed by ZFS, or UFS?

I'll stick it in a loop for an hour, try a bit harder. If that doesn't turn up anything, I'll make a real pool and put seekflood on a long run.

robn commented 1 week ago

I ran the test over and over for a few hours (I forgot about it...), no dice. I set it up to run many thousands of files & threads for a good long while, no change there either. Finally, I reran all that against the builtin OpenZFS in 13.2, which also refused to blow up.

robn@freebsd13:~ $ zfs version
zfs-2.1.9-FreeBSD_g92e0d9d18
zfs-kmod-2.1.9-FreeBSD_g92e0d9d18

robn@freebsd13:~ $ uname -a
FreeBSD freebsd13 13.2-RELEASE-p11 FreeBSD 13.2-RELEASE-p11 GENERIC amd64

So more info needed!

tonyhutter commented 1 week ago

Specifically, I'm wondering about whether or not you're using "real" disks or the default files in /var/tmp. If the latter, is /var/tmp itself backed by ZFS, or UFS?

Just UFS for all of /. I'm using the defaults for ZTS so I assume it's the /var/tmp disks.

I can still hit this 100% reliably in ZTS, but when I run the same seekflood binary manually to re-create the test by hand, I'm unable to hit the panic. It's very weird.