openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.68k stars 1.76k forks source link

panic: excl->share in zfs_clone_range when block_cloning is enabled #16789

Closed asomers closed 2 days ago

asomers commented 5 days ago

System information

Type Version/Name
Distribution Name FreeBSD
Distribution Version 15.0-CURRENT
Kernel Version main-n273749-4b65481ac68a-dirty (git main branch as of today)
Architecture amd64
OpenZFS Version zfs-2.3.0-rc1-FreeBSD_g3a9fca901 zfs-kmod-2.3.99-64-FreeBSD_g1c9a4c8cb

Describe the problem you're observing

I can immediately reproduce a panic: excl->share on FreeBSD 15.0-CURRENT with witness enabled by using fsx to copy part of a file to another offset of the same file, with copy_file_range. Note that zpool create by default enables the block_cloning feature. If I disable that with zpool create -o feature@block_cloning=disabled then I cannot reproduce the crash.

Describe how to reproduce the problem

$ sudo pkg install -y devel/fsx
$ sudo zpool create testpool vtbd5
$ sudo zfs create testpool/fsx
$ sudo chmod 1777 /testpool/fsx
$ cd /testpool/fsx
$ cat > fsx.toml <<HERE
flen = 1048576
nomsyncafterwrite = false
nosizechecks = false
blockmode = false

[opsize]
max = 262144
min = 0
align = 1

[weights]
close_open = 1
read = 10
write = 10
mapread = 10
mapwrite = 10
invalidate = 1
truncate = 1
fsync = 1
fdatasync = 1
posix_fallocate = 0
punch_hole = 1
sendfile = 1
posix_fadvise = 1
copy_file_range = 1
HERE
$ fsx -f fsx.toml -v fsx.bin

Include any warning/errors/backtraces from the system logs

Stack trace: ``` (kgdb) #0 __curthread () at /usr/home/somers/src/freebsd.org/src/sys/amd64/include/pcpu_aux.h:57 td = #1 doadump (textdump=textdump@entry=0) at /usr/home/somers/src/freebsd.org/src/sys/kern/kern_shutdown.c:404 error = 0 coredump = #2 0xffffffff80a4feed in db_dump (dummy=, dummy2=, dummy3=, dummy4=) at /usr/home/somers/src/freebsd.org/src/sys/ddb/db_command.c:596 error = #3 0xffffffff80a4f50b in db_command (last_cmdp=, cmd_table=, dopager=true) at /usr/home/somers/src/freebsd.org/src/sys/ddb/db_command.c:508 modif = '\000' , "\200\000\376\377\377\000\240\000\206\377\377\377\377\000\000\000\000\000\000\000\000\024\000ޅ\377\377\377\377\270\237\030\266\000\376\377\377\b", '\000' , "\270\237\030\266\000\376\377\377\\\016<\203\377\377\377\377", '\000' , "@\267(\264\000\376\377\377" addr = -2093216164 count = -1 cmd = 0xffffffff85c1a938 have_addr = t = result = #4 0xffffffff80a4e5e0 in db_command_loop () at /usr/home/somers/src/freebsd.org/src/sys/ddb/db_command.c:555 No locals. #5 0xffffffff80a604c2 in db_trap (type=type@entry=3, code=code@entry=0) at /usr/home/somers/src/freebsd.org/src/sys/ddb/db_main.c:267 jb = {{_jb = {0, -2196000688248, -2196000688048, 0, -2195968188488, -2196000688240, 0, -2136603956, 0, -2050388976, -2056072656, -2047184000}}} bkpt = false watchpt = false prev_jb = 0x0 why = 0xffffffff855999e1 "panic" #6 0xffffffff833c44c2 in kdb_trap (type=type@entry=3, code=0, tf=tf@entry=0xfffffe00b428bc20) at /usr/home/somers/src/freebsd.org/src/sys/kern/subr_kdb.c:790 __pc = 0x0 __pc = 0x0 other_cpus = {__bits = {18446741877708864128, 18446741877708864608, 0, 18446741877741359112, 18446744071616731088, 0, 0, 0, 0, 0, 0, 0, 18, 3023517235, 18446744069414584320, 18446741877708864600}} be = 0xffffffff85c1b4d0 intr = 2 did_stop_cpus = handled = #7 0xffffffff84fdfad4 in trap (frame=0xfffffe00b428bc20) at /usr/home/somers/src/freebsd.org/src/sys/amd64/amd64/trap.c:606 __pc = 0x0 __pc = 0x0 __pc = 0x0 ksi = {ksi_link = {tqe_next = 0xffffffff80a60bb6 , tqe_prev = 0x0}, ksi_info = {si_signo = 93489948, si_errno = 93489948, si_code = 93489948, si_pid = 93489948, si_uid = 93489948, si_status = 93489948, si_addr = 0x5755560, si_value = {sival_int = 91780296, sival_ptr = 0x57874c8057874c8, sigval_int = 91780296, sigval_ptr = 0x57874c8057874c8}, _reason = {_fault = {_trapno = -1148842708}, _timer = { _timerid = -1148842708, _overrun = 1122754903}, _mesgq = { _mqd = -1148842708}, _poll = {_band = 4822195592954776876}, _capsicum = {_syscall = -1148842708}, __spare__ = { __spare1__ = 4822195592954776876, __spare2__ = {-1239904248, -512, -1272399104, -512, 0, 0, -1272399216}}}}, ksi_flags = 0, ksi_sigq = 0xfffffe00b428bbb0} signo = -1 ucode = -1239904256 td = p = dr6 = type = 3 addr = pf = i = #8 No locals. #9 kdb_enter (why=0xffffffff855999e1 "panic", msg=0xffffffff855999e1 "panic") at /usr/home/somers/src/freebsd.org/src/sys/kern/subr_kdb.c:556 No locals. #10 0xffffffff83207d0b in vpanic (fmt=, fmt@entry=0xffffffff8555f098 "excl->share", ap=ap@entry=0xfffffe00b428bf70) at /usr/home/somers/src/freebsd.org/src/sys/kern/kern_shutdown.c:967 buf = "excl->share", '\000' __pc = 0x0 __pc = 0x0 __pc = 0x0 other_cpus = td = 0xfffffe00b125d740 bootopt = newpanic = #11 0xffffffff832076dd in panic (fmt=0xffffffff8555f098 "excl->share") at /usr/home/somers/src/freebsd.org/src/sys/kern/kern_shutdown.c:892 ap = {{gp_offset = 8, fp_offset = 48, overflow_arg_area = 0xfffffe00b428bff0, reg_save_area = 0xfffffe00b428bf40}} #12 0xffffffff834cf988 in witness_checkorder ( lock=lock@entry=0xfffffe00b8781e30, flags=flags@entry=1, file=file@entry=0xffffffff89dbd04b "/usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/include/os/freebsd/spl/sys/vnode.h", line=line@entry=101, interlock=interlock@entry=0x0) at /usr/home/somers/src/freebsd.org/src/sys/kern/subr_witness.c:1183 __pc = 0x0 w = 0xfffff8043fd86a80 class = 0xffffffff85cd2ac8 td = lock_list = 0xffffffff86052938 lock1 = iclass = plock = w1 = lle = j = i = lock2 = #13 0xffffffff830f999c in lockmgr_slock (lk=0xfffffe00b8781e30, flags=2098176, file=0xffffffff89dbd04b "/usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/include/os/freebsd/spl/sys/vnode.h", line=101) at /usr/home/somers/src/freebsd.org/src/sys/kern/kern_lock.c:1224 x = 0 #14 0xffffffff85481897 in VOP_LOCK1_APV ( vop=vop@entry=0xffffffff89e60190 , a=a@entry=0xfffffe00b428c3e0) at vnode_if.c:2241 rc = #15 0xffffffff83827f44 in VOP_LOCK1 (vp=0xfffffe00b8781dc0, flags=2098176, file=0xffffffff89dbd04b "/usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/include/os/freebsd/spl/sys/vnode.h", line=101) at ./vnode_if.h:1118 a = {a_gen = {a_desc = 0xffffffff85d7e570 }, a_vp = 0xfffffe00b8781dc0, a_flags = 2098176, a_file = 0xffffffff89dbd04b "/usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/include/os/freebsd/spl/sys/vnode.h", a_line = 101} #16 _vn_lock (vp=vp@entry=0xfffffe00b8781dc0, flags=flags@entry=2098176, file=0xffffffff89dbd04b "/usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/include/os/freebsd/spl/sys/vnode.h", line=line@entry=101) at /usr/home/somers/src/freebsd.org/src/sys/kern/vfs_vnops.c:1829 error = #17 0xffffffff89a6043b in vn_flush_cached_data ( vp=vp@entry=0xfffffe00b8781dc0, sync=1) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/include/os/freebsd/spl/sys/vnode.h:101 flags = 1 flags = #18 0xffffffff89a5d5b4 in zfs_clone_range (inzp=inzp@entry=0xfffffe00d549aae0, inoffp=inoffp@entry=0xfffffe00b428ccc0, outzp=outzp@entry=0xfffffe00d549aae0, outoffp=outoffp@entry=0xfffffe00b428ccc8, lenp=lenp@entry=0xfffffe00b428c8b8, cr=cr@entry=0xfffffe003e1b4200) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/module/zfs/zfs_vnops.c:1472 bulk = {{sa_data = 0xfffffe00b428c610, sa_data_func = 0xffffffff834d8c5e , sa_length = 50720, sa_attr = 46120, sa_addr = 0xffffffff86052984 , sa_buftype = 255, sa_size = 0}, {sa_data = 0xfffffe00b8781e38, sa_data_func = 0x0, sa_length = 21210, sa_attr = 34145, sa_addr = 0xfffffe00b428c680, sa_buftype = 17785, sa_size = 33613}, {sa_data = 0xffffffff00000009, sa_data_func = 0x0, sa_length = 0, sa_attr = 0, sa_addr = 0xffffffff85cd2ac8 , sa_buftype = 10552, sa_size = 34309}} mtime = {18446744071671194496, 18446741877741359112} ctime = {18446741875728397312, 0} nbps = 18446741877658343232 clear_setid_bits_txg = 0 count = 0 last_synced_txg = 0 inoff = 416811 outoff = 461290 len = 44479 done = 0 inzfsvfs = 0xfffffe00bdd7f000 outzfsvfs = 0xfffffe00bdd7f000 error = inos = 0xfffffe00bdc25000 outos = 0xfffffe00bdc25000 inlr = outlr = zilog = maxblocks = uid = gid = projid = bps = size = tx = db = outsize = inblksz = __sdt_probe36 = __sdt_probe37 = __sdt_probe38 = #19 0xffffffff891ef5a0 in zfs_freebsd_copy_file_range ( ap=ap@entry=0xfffffe00b428c9e8) at /usr/home/somers/src/freebsd.org/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:6179 mp = 0xfffffe00a8b62100 len = 44479 invp = 0xfffffe00b8781dc0 outvp = 0xfffffe00b8781dc0 outzfsvfs = error = #20 0xffffffff85492627 in VOP_COPY_FILE_RANGE_APV ( vop=vop@entry=0xffffffff89e60190 , a=a@entry=0xfffffe00b428c9e8) at vnode_if.c:4471 rc = #21 0xffffffff8383b16e in VOP_COPY_FILE_RANGE (invp=, inoffp=0xfffffe00b428ccc0, outvp=0xfffffe00b8781dc0, outoffp=0xfffffe00b428ccc8, lenp=0xfffffe00b428cba0, flags=0, incred=, outcred=, fsizetd=) at ./vnode_if.h:2419 a = {a_gen = {a_desc = 0xffffffff85d831c0 }, a_invp = 0xfffffe00b8781dc0, a_inoffp = 0xfffffe00b428ccc0, a_outvp = 0xfffffe00b8781dc0, a_outoffp = 0xfffffe00b428ccc8, a_lenp = 0xfffffe00b428cba0, a_flags = 0, a_incred = 0xfffffe003e1b4200, a_outcred = 0xfffffe003e1b4200, a_fsizetd = 0xfffffe00b125d740} #22 vn_copy_file_range (invp=invp@entry=0xfffffe00b8781dc0, inoffp=inoffp@entry=0xfffffe00b428ccc0, outvp=outvp@entry=0xfffffe00b8781dc0, outoffp=outoffp@entry=0xfffffe00b428ccc8, lenp=lenp@entry=0xfffffe00b428cba0, flags=flags@entry=0, incred=0xfffffe003e1b4200, outcred=0xfffffe003e1b4200, fsize_td=0xfffffe00b125d740) at /usr/home/somers/src/freebsd.org/src/sys/kern/vfs_vnops.c:3141 invpl = 0xfffffe00b8781dc0 outvpl = 0xfffffe00b8781dc0 len = error = uval = inmp = 0xfffffe00a8b62100 outmp = 0xfffffe00a8b62100 #23 0xffffffff8381b113 in kern_copy_file_range ( td=td@entry=0xfffffe00b125d740, infd=infd@entry=3, inoffp=, inoffp@entry=0xfffffe00b428ccc0, outfd=outfd@entry=3, outoffp=outoffp@entry=0xfffffe00b428ccc8, len=, len@entry=44479, flags=0) at /usr/home/somers/src/freebsd.org/src/sys/kern/vfs_syscalls.c:5004 infp = 0xfffffe00be2a5640 outfp = 0xfffffe00be2a5640 retlen = 44479 rl_wcookie = 0xfffffe00af7c1240 rl_rcookie = 0xfffffe00af7c0d80 savinoff = 416811 error = savoutoff = 461290 invp = 0xfffffe00b8781dc0 outvp = 0xfffffe00b8781dc0 #24 0xffffffff8381bfd1 in sys_copy_file_range (td=td@entry=0xfffffe00b125d740, uap=uap@entry=0xfffffe00b125db40) at /usr/home/somers/src/freebsd.org/src/sys/kern/vfs_syscalls.c:5042 inoff = 416811 outoff = 461290 outoffp = 0xfffffe00b428ccc8 inoffp = 0xfffffe00b428ccc0 error = #25 0xffffffff84fe7aaa in syscallenter (td=0xfffffe00b125d740) at /usr/home/somers/src/freebsd.org/src/sys/amd64/amd64/../../kern/subr_syscall.c:161 se = 0xffffffff85cca640 p = 0xfffffe00d4686020 sa = 0xfffffe00b125db30 error = sy_thr_static = traced = _audit_entered = #26 amd64_syscall (td=0xfffffe00b125d740, traced=0) at /usr/home/somers/src/freebsd.org/src/sys/amd64/amd64/trap.c:1192 ksi = {ksi_link = {tqe_next = 0xfffffe00d4686160, tqe_prev = 0xfffffe00b125d748}, ksi_info = { si_signo = -2092083493, si_errno = -1, si_code = 8, si_pid = 0, si_uid = 0, si_status = 0, si_addr = 0x0, si_value = { sival_int = 0, sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason = {_fault = {_trapno = -1148842708}, _timer = {_timerid = -1148842708, _overrun = 1122754903}, _mesgq = {_mqd = -1148842708}, _poll = { _band = 4822195592954776876}, _capsicum = { _syscall = -1148842708}, __spare__ = { __spare1__ = 4822195592954776876, __spare2__ = {0, 0, 0, -1, 0, -512, -1239904248}}}}, ksi_flags = 1069188108, ksi_sigq = 0xfffffe00b428ce60} #27 No locals. #28 0x00000076404d133a in ?? () No symbol table info available. Backtrace stopped: Cannot access memory at address 0x763d1f8198 ```

Links

Downstream bug report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=282878

amotin commented 5 days ago

As I can see, it is a lock recursion on a source of copy_file_range(). As I can see the first lock is taken by zfs_freebsd_copy_file_range(), calling zfs_clone_range(), while the second lock is taken by zn_flush_cached_data() called by zfs_clone_range(). Seems we need some reorganization there. The problem should happen only if mmap() and copy_file_range() are mixed on the same file.

asomers commented 5 days ago

The problem should happen only if mmap() and copy_file_range() are mixed on the same file.

Yes, fsx does that. It mixes ordinary writes, mmap writes, and copy_file_range writes on the same file.

amotin commented 4 days ago

It seems zn_flush_cached_data() takes vnode lock since it's other caller zfs_ioctl() on FreeBSD unlike Linux does not lock it. I think proper solution would be to lock it there instead, though it is not my strongest area. Meanwhile from another side it seems to help to change zfs_freebsd_copy_file_range() from LK_EXCLUSIVE to LK_SHARED, which should be good by itself.

amotin commented 3 days ago

@asomers I kind of fixed it twice, so would be nice if you could test and review https://github.com/openzfs/zfs/pull/16796 and https://github.com/openzfs/zfs/pull/16797 .